Ryan Huebsch, Joseph M. Hellerstein, Ion Stoica, Nick Lanham, Boon Thau Loo, Scott Shenker Querying the Internet with PIER Speaker: Natalia KozlovaTutor:

Slides:



Advertisements
Similar presentations
The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Advertisements

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Implementing Declarative Overlays From two talks by: Boon Thau Loo 1 Tyson Condie 1, Joseph M. Hellerstein 1,2, Petros Maniatis 2, Timothy Roscoe 2, Ion.
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
Kademlia: A Peer-to-peer Information System Based on the XOR Metric Petar Mayamounkov David Mazières A few slides are taken from the authors’ original.
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Querying the Internet with PIER Article by: Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica, 2003 EECS Computer.
Somdas Bandyopadhyay Anirban Basumallik
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
The Architecture of PIER: an Internet-Scale Query Processor (PIER = Peer-to-peer Information Exchange and Retrieval) Ryan Huebsch Brent Chun, Joseph M.
P2p, Fall 05 1 Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) VLDB 2003 Ryan Huebsch, Joe Hellerstein, Nick Lanham,
Scalable Resource Information Service for Computational Grids Nian-Feng Tzeng Center for Advanced Computer Studies University of Louisiana at Lafayette.
Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval)
Looking Up Data in P2P Systems Hari Balakrishnan M.Frans Kaashoek David Karger Robert Morris Ion Stoica.
Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS Peer-to-Peer Systems 12/9/03.
1 PIER. 2 Presentation overview PIER Core functionality and design principles Core functionality and design principles Distributed join example. Distributed.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Anatomy of a Large-Scale Hypertextual Web Search Engine by Sergey Brin and Lawrence Page (1997) Presented By Wesley C. Maness.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
P2p, Fall 06 1 Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) VLDB 2003 Ryan Huebsch, Joe Hellerstein, Nick Lanham,
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) Ryan Huebsch Joe Hellerstein, Nick Lanham, Boon Thau Loo, Timothy.
Complex Queries in DHT-based Peer-to-Peer Networks Matthew Harren, Joe Hellerstein, Ryan Huebsch, Boon Thau Loo, Scott Shenker, Ion Stoica
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
PIER & PHI Overview of Challenges & Opportunities Ryan Huebsch † Joe Hellerstein † °, Boon Thau Loo †, Sam Mardanbeigi †, Scott Shenker †‡, Ion Stoica.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) Ryan Huebsch † Joe Hellerstein †, Nick Lanham †, Boon Thau Loo.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Titanium/Java Performance Analysis Ryan Huebsch Group: Boon Thau Loo, Matt Harren Joe Hellerstein, Ion Stoica, Scott Shenker P I E R Peer-to-Peer.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
EECB 473 DATA NETWORK ARCHITECTURE AND ELECTRONICS PREPARED BY JEHANA ERMY JAMALUDDIN Basic Packet Processing: Algorithms and Data Structures.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
PIER: Peer-to-Peer Information Exchange and Retrieval Ryan Huebsch Joe Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Querying The Internet With PIER Nitin Khandelwal.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
Nov 2006 Google released the paper on BigTable.
PIER ( Peer-to-Peer Information Exchange and Retrieval ) 30 March 07 Neha Singh.
Peer-to-Peer Systems: An Overview Hongyu Li. Outline  Introduction  Characteristics of P2P  Algorithms  P2P Applications  Conclusion.
Bigtable: A Distributed Storage System for Structured Data
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
REED : Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Nick McKeown CS244 Lecture 17 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications [Stoica et al 2001]
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Gorilla: A Fast, Scalable, In-Memory Time Series Database
CHAPTER 3 Architectures for Distributed Systems
Plethora: Infrastructure and System Design
1 Demand of your DB is changing Presented By: Ashwani Kumar
DHT Routing Geometries and Chord
Database System Architectures
Presentation transcript:

Ryan Huebsch, Joseph M. Hellerstein, Ion Stoica, Nick Lanham, Boon Thau Loo, Scott Shenker Querying the Internet with PIER Speaker: Natalia KozlovaTutor: Matthias Bender

Quering the Internet with PIER. Natalia Kozlova Outline Inroduction What is PIER? Design Principles Implementation DHT Query Processor Performance Summary

Quering the Internet with PIER. Natalia Kozlova Introduction Databases: powerful query facilities declarative interface potential to scale up to few hundred computers What about Internet? If we want well distributed system that has query facilities (SQL) fault tolerance flexibility PIER is a query engine that scales up to thousands of participating nodes and can work on various data

Quering the Internet with PIER. Natalia Kozlova What is PIER? Peer-to-Peer Information Exchange and Retrieval Query engine that runs on top of P2P network step to the distributed query processing at a larger scale way for massive distribution: querying heterogeneous data Architecture meets traditional database query processing with recent peer-to-peer technologies

Quering the Internet with PIER. Natalia Kozlova Design Principles Relaxed Consistency adjusts availability of the system Organic Scaling No need in a priori allocation of a data center Natural Habitats for Data No DB schemas, file system or perhaps a live feed Standard Schemas via Grassroots Software widespread programs provide de facto standards.

Quering the Internet with PIER. Natalia Kozlova Outline Introduction What is PIER? Design Principles Implementation DHT Query Engine Scalability Summary

Quering the Internet with PIER. Natalia Kozlova Implementation – DHT << based on CAN DHT structure: routing layer storage manager provider

Quering the Internet with PIER. Natalia Kozlova Routing layer maps a key into the IP address of the node currently responsible for that key. Provides exact lookups, callbacks higher levels when the set of keys has changed Routing layer API lookup(key)  ipaddr join(landmarkNode) leave() locationMapChange DHT – Routing & Storage Storage Manager stores and retrieves records, which consist of key/value pairs. Keys are used to locate items and can be any data type or structure supported Storage Manager API store(key, item) retrieve(key)  item remove(key)

Quering the Internet with PIER. Natalia Kozlova DHT – Provider (1) Provider ties routing and storage manager layers and provides an interface Each object in the DHT has a namespace, resourceID and instanceID DHT key = hash(namespace,resourceID) namespace - application or group of object, table resourceID – what is object, primary key or any attribute instanceID – integer, to separate items with the same namespace and resourceID CAN’s mapping of resourceID/Object is equivalent to an index

Quering the Internet with PIER. Natalia Kozlova DHT – Provider (2) Provider API get(namespace, resourceID)  item put(namespace, resourceID, item, lifetime) renew(namespace, resourceID, instanceID, lifetime)  bool multicast(namespace, resourceID, item) lscan(namespace)  items newData(namespace, item) Node R1 (1..n) Table R (namespace) (1..n) tuples (n+1..m) tuples Node R2 (n+1..m) rID1 item rID3 item rID2 item

Quering the Internet with PIER. Natalia Kozlova Implementation – Query Engine << query processor QP Structure:  core engine  query optimizer  catalog manager

Quering the Internet with PIER. Natalia Kozlova Query Processor How it works? performs selection, projection, joins, grouping, aggregation simultaneous execution of multiple operators pipelined together results are produced and queued as quick as possible How it modifies data? insert, update and delete different items via DHT interface How it selects data to process? dilated-reachable snapshot – data, published by reachable nodes at the query arrival time

Quering the Internet with PIER. Natalia Kozlova Query Processor – Joins (1) Symmetric hash join At each site (Scan) lscan N R and N S (Rehash) put into N Q a copy of each eligible tuple (Listen) use newData to see the rehashed tuples in N Q (Compute) join the tuples as they arrive to N Q *Basic, uses a lot of network resources Join(R,S, R.sid = S.id) NXNX NXNX NRNR NSNS NRNR NSNS put(R tup ) put(S tup ) newData multicast query lscan(N R ) lscan(N S ) lscan(N R ) lscan(N S ) NQNQ NQNQ NRNR NSNS NRNR NSNS

Quering the Internet with PIER. Natalia Kozlova Query Processor – Joins (2) Fetch matches At each site (Scan) lscan( N R ) (Get) for each suitable R tuple get for the matching S tuple When S tuples arrive at R, join them Pass results *Retrieve only tuples that matched Join(R,S, R.sid = S.id) NRNR NXNX NXNX NSNS NRNR NSNS hashed get(rID) S tup

Quering the Internet with PIER. Natalia Kozlova Performance: Join Algorithms R + S = 25 GB n = m = 1024 inbound capacity = 10 Mbps hop latency =100 ms

Quering the Internet with PIER. Natalia Kozlova Query Processor – Join rewriting Symmetric semi-join (Project) both R and S to their resourceIDs and join keys (Small rehash) Perform a SHJ on the two projections (Compute) Send results into FM join for each of the tables *Minimizes initial communication Bloom joins (Scan) create Bloom Filter for a fragment of relation (Put) Publish filter for R, S (Multicast) Distribute filters (Rehash) only tuples matched the filter (Compute) Run SHJ *Reduces rehashing

Quering the Internet with PIER. Natalia Kozlova Performance: Join Algorithms R + S = 25 GB n = m = 1024 inbound capacity = 10 Mbps hop latency =100 ms

Quering the Internet with PIER. Natalia Kozlova Outline Introduction What is PIER? Design Principles Implementation DHT Query Processor Scalability Summary

Quering the Internet with PIER. Natalia Kozlova Scalability Simulation Conditions |R| =10 |S| Constants produce selectivity of 50% Query: SELECT R.key, S.key, R.pad FROM R,S WHERE R.n1 = S.key AND R.n2 > const1 AND S.n2 > const2 AND f(R.n3,S.n3) > const3

Quering the Internet with PIER. Natalia Kozlova Experimental Results Equipment: cluster of 64 PCs 1 Gbps network Result: Time to receive 30-th result tuple practically remains unchanged as both the size and load are scaled up.

Quering the Internet with PIER. Natalia Kozlova Summary PIER is a structured query system intended to run at the big scale PIER queries data that preexists in the wide area DHT is a core scalability mechanism for indexing, routing and query state management Big front of future work: Caching Query optimization Security …

Quering the Internet with PIER. Natalia Kozlova The End Thank you for attention! Questions?