Outline | Motivation| Design | Results| Status| Future

Slides:



Advertisements
Similar presentations
Matei Zaharia, in collaboration with Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Cliff Engle, Michael Franklin, Haoyuan Li, Antonio Lupher, Justin Ma,
Advertisements

Spark Streaming Large-scale near-real-time stream processing
The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
UC Berkeley a Spark in the cloud iterative and interactive cluster computing Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October Strata & Hadoop World NYC Website: tachyon-project.org Meetup:
Spark Streaming Large-scale near-real-time stream processing UC BERKELEY Tathagata Das (TD)
THE DATACENTER NEEDS AN OPERATING SYSTEM MATEI ZAHARIA, BENJAMIN HINDMAN, ANDY KONWINSKI, ALI GHODSI, ANTHONY JOSEPH, RANDY KATZ, SCOTT SHENKER, ION STOICA.
Spark in the Hadoop Ecosystem Eric Baldeschwieler (a.k.a. Eric14)
Matei Zaharia University of California, Berkeley Spark in Action Fast Big Data Analytics using Scala UC BERKELEY.
UC Berkeley Spark Cluster Computing with Working Sets Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.
Berkeley Data Analytics Stack (BDAS) Overview Ion Stoica UC Berkeley UC BERKELEY.
Lecture 18-1 Lecture 17-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Hilfi Alkaff November 5, 2013 Lecture 21 Stream Processing.
Spark: Cluster Computing with Working Sets
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica Spark Fast, Interactive,
Discretized Streams: Fault-Tolerant Streaming Computation at Scale Wenting Wang 1.
Discretized Streams Fault-Tolerant Streaming Computation at Scale Matei Zaharia, Tathagata Das (TD), Haoyuan (HY) Li, Timothy Hunter, Scott Shenker, Ion.
Discretized Streams An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker,
Shark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Hive on Spark.
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica Spark Fast, Interactive,
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
AMPCamp Introduction to Berkeley Data Analytics Systems (BDAS)
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
Approximate Queries on Very Large Data UC Berkeley Sameer Agarwal Joint work with Ariel Kleiner, Henry Milner, Barzan Mozafari, Ameet Talwalkar, Michael.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
The Hadoop Distributed File System
UC Berkeley Spark A framework for iterative and interactive cluster computing Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.
Storage in Big Data Systems
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Tachyon: memory-speed data sharing Haoyuan (HY) Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, Ion Stoica Good morning everyone. My name is Haoyuan,
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
MARISSA: MApReduce Implementation for Streaming Science Applications 作者 : Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, L. ; Gunter, D. ; Canon,
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
Spark Streaming Large-scale near-real-time stream processing
Fine-grained Partitioning for Aggressive Data Skipping Liwen Sun, Michael J. Franklin, Sanjay Krishnan, Reynold S. Xin† UC Berkeley and †Databricks Inc.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica.
Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Matei Zaharia, in collaboration with Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Haoyuan Li, Justin Ma, Murphy McCauley, Joshua Rosen, Reynold Xin,
Spark Debugger Ankur Dave, Matei Zaharia, Murphy McCauley, Scott Shenker, Ion Stoica UC BERKELEY.
Cloudera Kudu Introduction
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Next Generation of Apache Hadoop MapReduce Owen
Spark System Background Matei Zaharia  [June HotCloud ]  Spark: Cluster Computing with Working Sets  [April NSDI.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI 11’ Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
CSCI5570 Large Scale Data Processing Systems Distributed Data Analytics Systems Slide Ack.: modified based on the slides from Matei Zaharia James Cheng.
Image taken from: slideshare
How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing Xueyan Li (Qunar) & Chunming Li (Garena)
BD-Cache: Big Data Caching for Datacenters
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
PA an Coordinated Memory Caching for Parallel Jobs
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Introduction to Spark.
Sajitha Naduvil-vadukootu
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Fast, Interactive, Language-Integrated Cluster Computing
Presentation transcript:

Tachyon: Reliable File Sharing at Memory-Speed Across Cluster Frameworks Haoyuan Li UC Berkeley

Outline | Motivation| Design | Results| Status| Future System Design Evaluation Results Release Status Future Directions Outline | Motivation| Design | Results| Status| Future

Outline| Motivation | Design | Results| Status| Future Memory is King Outline| Motivation | Design | Results| Status| Future

Outline| Motivation | Design | Results| Status| Future Memory Trend RAM throughput increasing exponentially Outline| Motivation | Design | Results| Status| Future

Outline| Motivation | Design | Results| Status| Future Disk Trend Disk throughput increasing slowly Outline| Motivation | Design | Results| Status| Future

Outline| Motivation | Design | Results| Status| Future Consequence Memory locality key to achieve Interactive queries Fast query response Outline| Motivation | Design | Results| Status| Future

Current Big Data Eco-system Many frameworks already leverage memory e.g. Spark, Shark, and other projects File sharing among jobs replicated to disk Replication enables fault-tolerance Problems Disk scan is slow for read. Synchronous disk replication for write is even slower. Outline| Motivation | Design | Results| Status| Future

Outline| Motivation | Design | Results| Status| Future Tachyon Project Reliable file sharing at memory-speed across cluster frameworks/jobs Challenge How to achieve reliable file sharing without replication? Outline| Motivation | Design | Results| Status| Future

Re-computation (Lineage) based storage using memory aggressively. Idea Re-computation (Lineage) based storage using memory aggressively. One copy of data in memory (Fast) Upon failure, re-compute data using lineage (Fault tolerant) Outline| Motivation | Design | Results| Status| Future

Outline| Motivation | Design | Results| Status| Future Stack Outline| Motivation | Design | Results| Status| Future

Outline| Motivation | Design | Results| Status| Future System Architecture Outline| Motivation | Design | Results| Status| Future

Outline| Motivation | Design | Results| Status| Future Lineage Outline| Motivation | Design | Results| Status| Future

Outline| Motivation | Design | Results| Status| Future Lineage Information Binary program Configuration Input Files List Output Files List Dependency Type Outline| Motivation | Design | Results| Status| Future

Outline| Motivation | Design | Results| Status| Future Fault Recovery Time Re-computation Cost? Outline| Motivation | Design | Results| Status| Future

Outline| Motivation | Design | Results| Status| Future Example Outline| Motivation | Design | Results| Status| Future

Asynchronous Checkpoint Better than using existing solutions even under failure. Bounded recovery time (Naïve and Snapshot asynchronous checkpointing). Outline| Motivation | Design | Results| Status| Future

Master Fault Tolerance Multiple masters Use ZooKeeper to elect a leader After crash workers contact new leader Update the state of leader with contents of caches Outline| Motivation | Design | Results| Status| Future

Implementation Details 15,000+ lines of JAVA Thrift for data transport Underlayer file system supports HDFS, S3, localFS, GlusterFS Maven, Jenkins Outline| Motivation | Design | Results| Status| Future

Sequential Read using Spark Theoretical Maximum Disk Throughput Flat Datacenter Storage Outline| Motivation | Design | Results | Status| Future

Sequential Write using Spark Theoretical Maximum Disk Throughput Flat Datacenter Storage Outline| Motivation | Design | Results | Status| Future

Realistic Workflow using Spark Outline| Motivation | Design | Results | Status| Future

Realistic Workflow Under Failure Outline| Motivation | Design | Results | Status| Future

Conviva Spark Query (I/O intensive) Tachyon outperforms Spark cache because of JAVA GC More than 75x speedup Outline| Motivation | Design | Results | Status| Future

Conviva Spark Query (less I/O intensive) 12x speedup GC kicks in earlier for Spark cache Outline| Motivation | Design | Results | Status| Future

Outline| Motivation | Design | Results | Status | Future Alpha Status Releases Developer Preview: V0.2.1 (4/25/2013) Contributions from: Outline| Motivation | Design | Results | Status | Future

Outline| Motivation | Design | Results | Status | Future Alpha Status First read of files cached in-memory Writes go synchronously to HDFS (No lineage information in Developer Preview release) MapReduce and Spark can run without any code change (ser/de becomes the new bottleneck) Outline| Motivation | Design | Results | Status | Future

Outline| Motivation | Design | Results | Status | Future Current Features Java-like file API Compatible with Hadoop Master fault tolerance Native support for raw tables WhiteList, PinList Command line interaction Web user interface Outline| Motivation | Design | Results | Status | Future

Spark without Tachyon val file = sc.textFile(“hdfs://ip:port/path”) Outline| Motivation | Design | Results | Status | Future

Spark with Tachyon val file = sc.textFile(“tachyon:// ip:port/path”) Outline| Motivation | Design | Results | Status | Future

Shark without Tachyon CREATE TABLE orders_cached AS SELECT * FROM orders; Outline| Motivation | Design | Results | Status | Future

Shark with Tachyon CREATE TABLE orders_tachyon AS SELECT * FROM orders; Outline| Motivation | Design | Results | Status | Future

Outline| Motivation | Design | Results | Status | Future Experiments on Shark Shark (from 0.7) can store tables in Tachyon with fast columnar Ser/De 20 GB data / 5 machines Spark Cache Tachyon Table Full Scan 1.4 sec 1.5 sec GroupBys (10 GB Shark Memory) 50 – 90 sec 45 – 50 sec GroupBys (15 GB Shark Memory) 44 – 48 sec 37 – 45 sec Outline| Motivation | Design | Results | Status | Future

Outline| Motivation | Design | Results | Status | Future Experiments on Shark Shark (from 0.7) can store tables in Tachyon with fast columnar Ser/De 20 GB data / 5 machines Spark Cache Tachyon Table Full Scan 1.4 sec 1.5 sec GroupBys (10 GB Shark Memory) 50 – 90 sec 45 – 50 sec GroupBys (15 GB Shark Memory) 44 – 48 sec 37 – 45 sec 4 * 100 GB TPC-H data / 17 machines Spark Cache Tachyon TPC-H Q1 65.68 sec 24.75 sec TPC-H Q2 438.49 sec 139.25 sec TPC-H Q3 467.79 sec 55.99 sec TPC-H Q4 457.50 sec 111.65 sec Outline| Motivation | Design | Results | Status | Future

Outline| Motivation | Design | Results | Status | Future Efficient Ser/De support Fair sharing for memory Full support for lineage Next release is coming soon Outline| Motivation | Design | Results | Status | Future

Outline| Motivation | Design | Results | Status | Future Acknowledgment Research Team: Haoyuan Li, Ali Ghodsi, Matei Zaharia, Eric Baldeschwieler , Scott Shenker, Ion Stoica Code Contributors: Haoyuan Li, Calvin Jia, Bill Zhao, Mark Hamstra, Rong Gu, Hobin Yoon, Vamsi Chitters, Reynold Xin, Srinivas Parayya, Dilip Joseph Outline| Motivation | Design | Results | Status | Future

Questions? http://tachyon-project.org https://github.com/amplab/tachyon