1 CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Measurement Studies Lecture 23 Reading: See links on website All Slides © IG.

Slides:

Advertisements

Similar presentations

A Measurement Study of Peer-to-Peer File Sharing Systems Presented by Cristina Abad.

Advertisements

MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO

Web Server Benchmarking Using the Internet Protocol Traffic and Network Emulator Carey Williamson, Rob Simmonds, Martin Arlitt et al. University of Calgary.

Low Cost, Scalable Proteomics Data Analysis Using Amazon's Cloud Computing Services and Open Source Search Algorithms Brian D. Halligan, Ph.D. Medical.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

DSPIN: Detecting Automatically Spun Content on the Web Qing Zhang, David Y. Wang, Geoffrey M. Voelker University of California, San Diego 1.

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy University of California at Santa Barbara.

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Presented For Cs294-4 Fall 2003 By Jon Hess.

An Empirical Study of Real Audio Traffic A. Mena and J. Heidemann USC/Information Sciences Institute In Proceedings of IEEE Infocom Tel-Aviv, Israel March.

1 School of Computing Science Simon Fraser University, Canada Modeling and Caching of P2P Traffic Mohamed Hefeeda Osama Saleh ICNP’06 15 November 2006.

Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.

Measurement, Modeling, and Analysis of a Peer-to-Peer File sharing Workload Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble, Henry.

1 Web Performance Modeling Chapter New Phenomena in the Internet and WWW Self-similarity - a self-similar process looks bursty across several time.

Effects and Implications of File Size/Service Time Correlation on Web Server Scheduling Policies Dong Lu* + Peter Dinda* Yi Qiao* Huanyuan Sheng* *Northwestern.

Network Traffic Measurement and Modeling CSCI 780, Fall 2005.

Rutgers PANIC Laboratory The State University of New Jersey Self-Managing Federated Services Francisco Matias Cuenca-Acuna and Thu D. Nguyen Department.

Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.

1 Drafting Behind Akamai (Travelocity-Based Detouring) AoJan Su, David R. Choffnes, Aleksandar Kuzmanovic, and Fabian E. Bustamante Department of Electrical.

Copyright © 2005 Department of Computer Science CPSC 641 Winter Network Traffic Measurement A focus of networking research for 20+ years Collect.

1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)

1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.

Characterizing and Predicting TCP Throughput on the Wide Area Network Dong Lu, Yi Qiao, Peter Dinda, Fabian Bustamante Department of Computer Science Northwestern.

February 11, 2003Ninth International Symposium on High Performance Computer Architecture Memory System Behavior of Java-Based Middleware Martin Karlsson,

Junxian Huang 1 Feng Qian 2 Yihua Guo 1 Yuanyuan Zhou 1 Qiang Xu 1 Z. Morley Mao 1 Subhabrata Sen 2 Oliver Spatscheck 2 1 University of Michigan 2 AT&T.

1 CS 525 Advanced Distributed Systems Spring 2013 Indranil Gupta (Indy) Measurement Studies March 26, 2013 All Slides © IG Acknowledgments: Some slides.

Google Distributed System and Hadoop Lakshmi Thyagarajan.

On Availability of Intermediate Data in Cloud Computations Steven Y. Ko, Imranul Hoque, Brian Cho, and Indranil Gupta Distributed Protocols Research Group.

WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Improving the Reliability of Internet Paths with One-hop Source Routing Krishna Gummadi, Harsha Madhyastha Steve Gribble, Hank Levy, David Wetherall Department.

Monitoring Latency Sensitive Enterprise Applications on the Cloud Shankar Narayanan Ashiwan Sivakumar.

Advanced Computer Networks1 Efficient Policies for Carrying Traffic Over Flow-Switched Networks Anja Feldmann, Jenifer Rexford, and Ramon Caceres Presenters:

Submitted by: Shailendra Kumar Sharma 06EYTCS049.

Skype P2P Kedar Kulkarni 04/02/09.

Optimal Scheduling of File Transfers with Divisible Sizes on Multiple Disjoint Paths Mugurel Ionut Andreica Polytechnic University of Bucharest Computer.

Introduction to Hadoop and HDFS

Improving QoS Support in Mobile Ad Hoc Networks Agenda Motivations Proposed Framework Packet-level FEC Multipath Routing Simulation Results Conclusions.

Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K. Vaidyanathan P. Balaji H. –W. Jin D.K. Panda Network-Based.

1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Measurement Studies April 8, 2014 All Slides © IG Acknowledgments: Some slides.

1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.

Characteristic Studies of Distributed Systems Maryam Rahmaniheris & Daniel Uhlig 1.

1 presentation of article: Small-World File-Sharing Communities Article: Adriana Iamnitchi, Matei Ripeanu, Ian Foster Presentation: Periklis Akritidis.

Page 1 Online Aggregation for Large MapReduce Jobs Niketan Pansare, Vinayak Borkar, Chris Jermaine, Tyson Condie VLDB 2011 IDS Fall Seminar

An Experimental Study of the Skype Peer-to-Peer VoIP System Saikat Guha, Cornell University Neil DasWani, Google Ravi Jain, Google IPTPS ’ 06 Presenter:

MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

Characterizing User Access To Videos On The World Wide Web MMCN 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Peter Parnes.

Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.

Scale up Vs. Scale out in Cloud Storage and Graph Processing Systems

1 CS 525 Advanced Distributed Systems Spring 2011 Indranil Gupta (Indy) Measurement Studies April 12, 2011 All Slides © IG Acknowledgments: Some slides.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

ApproxHadoop Bringing Approximations to MapReduce Frameworks

MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

1 CS 425 / ECE 428 Distributed Systems Fall 2013 Indranil Gupta (Indy) Measurement Studies Lecture 22 Nov Reading: See links on website All Slides.

On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.

Parallel IO for Cluster Computing Tran, Van Hoai.

#16 Application Measurement Presentation by Bobin John.

1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.

PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,

Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio.

CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)

Lecture 23: Structure of Networks

Authors: Sajjad Rizvi, Xi Li, Bernard Wong, Fiodar Kazhamiaka

PA an Coordinated Memory Caching for Parallel Jobs

Replication Middleware for Cloud Based Storage Service

Lecture 23: Structure of Networks

湖南大学-信息科学与工程学院-计算机与科学系

Building a Database on S3

Lecture 23: Structure of Networks

Presentation transcript:

1 CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Measurement Studies Lecture 23 Reading: See links on website All Slides © IG

Motivation We design algorithms, implement and deploy them as systems But when you factor in the real world, unexpected characteristics may arise Important to understand these characteristics to build better distributed systems for the real world We’ll look at three systems: Kazaa (P2P system), AWS EC2 (Elastic Compute Cloud), Hadoop

3 How do you find characteristics of these Systems in Real-life Settings? Write a crawler to crawl a real working system (e.g., p2p system), or a benchmark (e.g., Hadoop) Collect traces from the crawler/benchmark Tabulate the results Papers contain plenty of information on how data was collected, the caveats, ifs and buts of the interpretation, etc. –These are important, but we will ignore them for this lecture and concentrate on the raw data and conclusions

4 Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload Gummadi et al Department of Computer Science University of Washington Acknowledgments: Jay Patel

5 What They Did Kazaa: popular p2p file sharing system in paper analyzed 200-day trace of Kazaa traffic Considered only requests going from U. Washington to the outside Developed a model of multimedia workloads

6 User characteristics (1) Users are patient

7 User characteristics (2) Users slow down as they age –clients “die” –older clients ask for less each time they use system

8 User characteristics (3) Client availability = time client present in system –Tracing used could only detect users when their clients transfer data –Thus, they only report statistics on client activity, which is a lower bound on availability –Avg session lengths are typically small (median: 2.4 mins) Many transactions fail

9 Object characteristics (1) Kazaa is not one workload Dichotomy: Small files vs. large files

10 Object characteristics (2) Kazaa object dynamics –Kazaa clients fetch objects at most once –Popularity of objects is often short-lived 94% time a Kazaa client requests a given object at most once 99% of time a Kazaa client requests a given object at most twice –Most popular objects tend to be recently-born objects –Most requests are for old objects (> 1 month) 72% old – 28% new for large objects 52% old – 48% new for small objects

11 Object characteristics (3) Kazaa is not Zipf, but it is heavy-tailed Zipf’s law: popularity of ith-most popular object is proportional to i -α, (α: Zipf coefficient) Web access patterns are Zipf Authors conclude that Kazaa is not Zipf because of the at-most-once fetch characteristics Caveat: what is an “object” in Kazaa?

12 Results Summary 1.Users are patient 2.Users slow down as they age 3.Kazaa is not one workload 4.Kazaa clients fetch objects at-most-once 5.Popularity of objects is often short-lived 6.Kazaa is not Zipf

13 An Evaluation of Amazon’s Grid Computing Services: EC2, S3, and SQS Simson L. Garfinkel SEAS, Harvard University

14 What they Did Did bandwidth measurements –From various sites to S3 (Simple Storage Service) –Between S3, EC2 (Elastic Compute Cloud) and SQS (Simple Queuing Service)

15 Effective Bandwidth varies heavily based on (network) geography!

MB Get Ops from EC2 to S3 Throughput is relatively stable, except when internal network was reconfigured.

17 Read and Write throughputs: larger block size is better (but beyond some block size, it makes little difference).

18 Concurrency: Consecutive requests receive performance that are highly correlated, especially for large requests

19 QoS received by requests fall into multiple “classes” MB xfers fall into 2 classes.

20 Results Summary 1.Effective Bandwidth varies heavily based on geography! 2.Throughput is relatively stable, except when internal network was reconfigured. 3.Read and Write throughputs: larger requests are better, but throughput plateaus with file size –Decreases overhead 4.Consecutive requests receive performance that are highly correlated 5.QoS received by requests fall into multiple “classes”, but need to give clients more control (e.g., via SLAs = Service Level Agreements)

What Do Real-Life Hadoop Workloads Look Like? (Cloudera) (Slide Acknowledgments: Brian Cho) 21

What They Did Hadoop workloads from 5 Cloudera customers –Diverse industries: “in e-commerce, telecommunications, media, and retail” –2011 Hadoop workloads from Facebook –2009, 2010 across same cluster 22

The Workloads TB/ Day Jobs/ Day GB/ Job

Skew in access frequency across (HDFS) files 90% of jobs access files of less than a few GBs; such files account for only 16% of bytes stored 90%, < few GBs 16% Data access patterns (1/2) 24 Zipf

Data access patterns (2/2) Temporal Locality in data accesses 70%, < 10mins = Sequences of Hadoop jobs? Can you make Hadoop/HDFS better, now that you know these characteristics? 25 < 6 hrs Compare

Burstiness Plotted –Sum of task-time (map + reduce) over an hour interval –n-th percentile / median Facebook –From 2009 to 2010, peak-to-median ratio dropped from 31:1 to 9:1 –Claim: consolidation decreases burstiness 26

High-level Processing Frameworks 27 Each cluster prefers 1-2 data processing frameworks

Classification by multi-dimensional clustering 28

Results Summary Workloads different across industries Yet commonalities –Zipf distribution for access file access frequency –Slope same across all industries 90% of all jobs access small files, while the other 10% account for 84% of the file accesses –Parallels p2p systems (mp3-mpeg split) A few frameworks popular for each cluster Lots of small jobs 29

30 Summary We design algorithms, implement and deploy them But when you factor in the real world, unexpected characteristics may arise Important to understand these characteristics to build better distributed systems for the real world

31 Optional Slides - Understanding Availability (P2P Systems) R. Bhagwan, S. Savage, G. Voelker University of California, San Diego

32 What They Did Measurement study of peer-to-peer (P2P) file sharing application –Overnet (January 2003) –Based on Kademlia, a DHT based on xor routing metric Each node uses a random self-generated ID The ID remains constant (unlike IP address) Used to collect availability traces –Closed-source Analyze collected data to analyze availability Availability = % of time a node is online (node=user, or machine)

33 Crawler: –Takes a snapshot of all the active hosts by repeatedly requesting 50 randomly generated IDs. –The requests lead to discovery of some hosts (through routing requests), which are sent the same 50 IDs, and the process is repeated. –Run once every 4 hours to minimize impact Prober: –Probe the list of available IDs to check for availability By sending a request to ID I; request succeeds only if I replies Does not use TCP, avoids problems with NAT and DHCP –Used on only randomly selected 2400 hosts from the initial list –Run every 20 minutes What They Did

34 Scale of Data Ran for 15 days from January 14 to January 28 (with problems on January 21) 2003 Each pass of crawler yielded 40,000 hosts. In a single day (6 crawls) yielded between 70,000 and 90,000 unique hosts of the 2400 randomly selected hosts probes responded at least once

35 Multiple IP Hosts

36 Availability

37 Results Summary 1.Overall availability is low 2.Diurnal patterns existing in availability 3.Availabilities are uncorrelated across nodes 4.High Churn exists