Distributed Data Storage and Parallel Processing Engine Sector & Sphere Yunhong Gu Univ. of Illinois at Chicago.

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
SDN + Storage.
Distributed Data Storage and Processing over Commodity Clusters Sector & Sphere Yunhong Gu Univ. of Illinois at of Chicago, Feb. 17, 2009.
Spark: Cluster Computing with Working Sets
An Introduction to Sector/Sphere Sector & Sphere Yunhong Gu Univ. of Illinois at Chicago and VeryCloud June 22, 2010.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Distributed Computations MapReduce
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
MapReduce.
1 The Google File System Reporter: You-Wei Zhang.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
On the Varieties of Clouds for Data Intensive Computing 董耀文 Antslab Robert L. Grossman University of Illinois at Chicago And Open Data.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Large-scale file systems and Map-Reduce Single-node architecture Memory Disk CPU Google example: 20+ billion web pages x 20KB = 400+ Terabyte 1 computer.
MalStone:Towards A Benchmark for Analytics on Large Data Clouds Collin Bennett Open Data Group 400 Lathrop Ave Suite 90 River Forest IL Robert L.
MapReduce How to painlessly process terabytes of data.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, Yunhong Gu, and Andrew Levine Open Cloud Consortium.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Yunhong Gu and Robert Grossman University of Illinois at Chicago 碩資工一甲 王聖爵
Rate Control Rate control tunes the packet sending rate. No more than one packet can be sent during each packet sending period. Additive Increase: Every.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
SECTION 5: PERFORMANCE CHRIS ZINGRAF. OVERVIEW: This section measures the performance of MapReduce on two computations, Grep and Sort. These programs.
MapReduce and the New Software Stack CHAPTER 2 1.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
Toward Efficient and Simplified Distributed Data Intensive Computing IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 6, JUNE 2011PPT.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
BIG DATA/ Hadoop Interview Questions.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
Lecture 4. MapReduce Instructor: Weidong Shi (Larry), PhD
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Hadoop Aakash Kag What Why How 1.
Large-scale file systems and Map-Reduce
CSE-291 (Cloud Computing) Fall 2016
Introduction to MapReduce and Hadoop
Introduction to HDFS: Hadoop Distributed File System
Introduction to Networks
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
MapReduce Simplied Data Processing on Large Clusters
The Basics of Apache Hadoop
湖南大学-信息科学与工程学院-计算机与科学系
Hadoop Technopoints.
CS 345A Data Mining MapReduce This presentation has been altered.
MapReduce: Simplified Data Processing on Large Clusters
Map Reduce, Types, Formats and Features
Presentation transcript:

Distributed Data Storage and Parallel Processing Engine Sector & Sphere Yunhong Gu Univ. of Illinois at Chicago

What is Sector/Sphere? Sector: Distributed File System Sphere: Parallel Data Processing Engine (generic MapReduce) Open source software, GPL/BSD, written in C++. Started since 2006, current version

Overview Motivation Sector Sphere Experimental Results

Motivation Super-computer model: Expensive, data IO bottleneck Sector/Sphere model: Inexpensive, parallel data IO, data locality

Motivation Parallel/Distributed Programming with MPI, etc.: Flexible and powerful. But too complicated Sector/Sphere model (cloud model): Clusters are a unity to the developer, simplified programming interface. Limited to certain data parallel applications.

Motivation Systems for single data centers: Requires additional effort to locate and move data. Sector/Sphere model: Support wide-area data collection and distribution.

Sector Distributed File System Security ServerMasters slaves SSL Clients User account Data protection System Security Metadata Scheduling Service provider System access tools App. Programming Interfaces Storage and Processing Data UDT Encryption optional

Sector Distributed File System Sector stores files on the native/local file system of each slave node. Sector does not split files into blocks  Pro: simple/robust, suitable for wide area, fast and flexible data processing  Con: users need to handle file size properly The master nodes maintain the file system metadata. No permanent metadata is needed. Topology aware

Sector: Performance Data channel is set up directly between a slave and a client Multiple active-active masters (load balance), starting from 1.24 UDT is used for high speed data transfer  UDT is a high performance UDP-based data transfer protocol.  Much faster than TCP over wide area

UDT: UDP-based Data Transfer Open source UDP based data transfer protocol  With reliability control and congestion control Fast, firewall friendly, easy to use Already used in many commercial and research software

Sector: Fault Tolerance Sector uses replications for better reliability and availability Replicas can be made either at write time (instantly) or periodically Sector supports multiple active-active masters for high availability

Sector: Security Sector uses a security server to maintain user account and IP access control for masters, slaves, and clients Control messages are encrypted  not completely finished in the current version Data transfer can be encrypted as an option Data transfer channel is set up by rendezvous, no listening server.

Sector: Tools and API Supported file system operation: ls, stat, mv, cp, mkdir, rm, upload, download  Wild card characters supported System monitoring: sysinfo. C++ API: list, stat, move, copy, mkdir, remove, open, close, read, write, sysinfo. FUSE

Sphere: Simplified Data Processing Data parallel applications Data is processed at where it resides, or on the nearest possible node (locality) Same user defined functions (UDF) are applied on all elements (records, blocks, or files) Processing output can be written to Sector files or sent back to the client Generalized Map/Reduce

Sphere: Simplified Data Processing for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …); SphereStream sdss; sdss.init("sdss files"); SphereProcess myproc; myproc->run(sdss,"findBrownDwarf", …); myproc->read(result); findBrownDwarf(char* image, int isize, char* result, int rsize);

Sphere: Data Movement Slave -> Slave Local Slave -> Slaves (Shuffle/Hash) Slave -> Client

Sphere/UDF vs. MapReduce Record Offset Index UDF Hashing / Bucket - UDF - Parser / Input Reader Map Partition Compare Reduce Output Writer

Sphere/UDF vs. MapReduce Sphere is more straightforward and flexible  UDF can be applied directly on records, blocks, files, and even directories  Native binary data support  Sorting is required by Reduce, but it is optional in Sphere Sphere uses PUSH model for data movement, faster than the PULL model used by MapReduce

Why Sector doesn’t Split Files? Certain applications need to process a whole file or even directory Certain legacy applications need a file or a directory as input Certain applications need multiple inputs, e.g., everything in a directory In Hadoop, all blocks would have to be moved to one node for processing, hence no data locality benefit.

Load Balance The number of data segments is much more than the number of SPEs. When an SPE completes a data segment, a new segment will be assigned to the SPE. Data transfer is balanced across the system to optimize network bandwidth usage.

Fault Tolerance Map failure is recoverable  If one SPE fails, the data segment assigned to it will be re-assigned to another SPE and be processed again. Reduce failure is unrecoverable  In small-medium systems, machine failure during run time is rare  If necessary, developers can split the input into multiple sub-tasks to reduce the cost of reduce failure.

Open Cloud Testbed 4 Racks in Baltimore (JHU), Chicago (StarLight and UIC), and San Diego (Calit2) 10Gb/s inter-site connection on CiscoWave 2Gb/s inter-rack connection Two dual-core AMD CPU, 12GB RAM, 1TB single disk Will be doubled by Sept

Open Cloud Testbed

The TeraSort Benchmark Data is split into small files, scattered on all slaves Stage 1: On each slave, an SPE scans local files, sends each record to a bucket file on a remote node according to the key. Stage 2: On each destination node, an SPE sort all data inside each bucket.

TeraSort 10-byte90-byte Key Value 10-bit Bucket-0 Bucket-1 Bucket Stage 1: Hash based on the first 10 bits Bucket-0 Bucket-1 Bucket-1023 Stage 2: Sort each bucket on local node 100 bytes record

Performance Results: TeraSort Data Size SphereHadoop (3 replicas) Hadoop (1 replica) UIC300GB UIC + StarLight600GB UIC + StarLight + Calit2 900GB UIC + StarLight + Calit2 + JHU 1.2TB Run time: seconds Sector v1.16 vs Hadoop 0.17

Performance Results: TeraSort Sorting 1.2TB on 120 nodes Sphere Hash & Local Sort: 981sec + 545sec Hadoop: 3702/6675 seconds Sphere Hash  CPU: 130% MEM: 900MB Sphere Local Sort  CPU: 80% MEM: 1.4GB Hadoop: CPU 150% MEM 2GB

The MalStone Benchmark Drive-by problem: visit a web site and get comprised by malware. MalStone-A: compute the infection ratio of each site. MalStone-B: compute the infection ratio of each site from the beginning to the end of every week.

MalStone Site IDTime KeyValue 3-byte site-000X site-001X site-999X Stage 1: Process each record and hash into buckets according to site ID site-000X site-001X site-999x Stage 2: Compute infection rate for each merchant Event ID | Timestamp | Site ID | Compromise Flag | Entity ID | :56: | |1| Text Record Transform Flag

Performance Results: MalStone * Courtesy of Collin Bennet and Jonathan Seidman of Open Data Group. MalStone-AMalStone-B Hadoop454m 13s840m 50s Hadoop Streaming/Python 87m 29s142m 32s Sector/Sphere33m 40s43m 44s Process 10 billions records on 20 OCT nodes (local).

System Monitoring (Testbed)

System Monitoring (Sector/Sphere)

For More Information Sector/Sphere code & docs: Open Cloud Consortium: NCDM: