Sajitha Naduvil-vadukootu

Slides:



Advertisements
Similar presentations
Matei Zaharia, in collaboration with Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Cliff Engle, Michael Franklin, Haoyuan Li, Antonio Lupher, Justin Ma,
Advertisements

BARNALI CHAKRABARTY. What is an Operating System ?
The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
A Survey of Distributed Database Management Systems Brady Kyle CSC
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
The Hadoop Stack, Part 3 Introduction to Spark
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica Spark Fast, Interactive,
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Course Syllabus  Instructor: Hsung-Pin Chang  TA: 林郁傑  Web Site:
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Implementation of Simple Cloud-based Distributed File System Group ID: 4 Baolin Wu, Liushan Yang, Pengyu Ji.
Outline | Motivation| Design | Results| Status| Future
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Tachyon: memory-speed data sharing Haoyuan (HY) Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, Ion Stoica Good morning everyone. My name is Haoyuan,
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Architectures of distributed systems Fundamental Models
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Architectures.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Thread Usage.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Distributed Systems: Principles and Paradigms By Andrew S. Tanenbaum and Maarten van Steen.
Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Matei Zaharia, in collaboration with Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Haoyuan Li, Justin Ma, Murphy McCauley, Joshua Rosen, Reynold Xin,
Ch 11 Distributed File System Ch11.1 Architecture Lei Zhang Oct
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Next Generation of Apache Hadoop MapReduce Owen
Distributed Systems: Distributed File Systems Ghada Ahmed, PhD. Assistant Prof., Computer Science Dept. Web:
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI 11’ Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
1 Chapter 5: Threads Overview Multithreading Models & Issues Read Chapter 5 pages
Advanced Operating Systems Chapter 6.1 – Characteristics of a DFS Jongchan Shin.
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Introduction to Operating Systems
Introduction to Distributed Platforms
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
Spark Presentation.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
University of Technology
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Distributed System Concepts and Architectures
CSC-8320 Advanced Operating System
Replication Middleware for Cloud Based Storage Service
Introduction to Spark.
Ch 11 Distributed File System
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
CS110: Discussion about Spark
Hadoop Technopoints.
Introduction to Apache
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Architectures of distributed systems Fundamental Models
Architectures of distributed systems Fundamental Models
Introduction to MapReduce
Architectures of distributed systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Architectures of distributed systems Fundamental Models
Fast, Interactive, Language-Integrated Cluster Computing
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
MapReduce: Simplified Data Processing on Large Clusters
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Presentation transcript:

Sajitha Naduvil-vadukootu 11.2 Processes Sajitha Naduvil-vadukootu

Overview Review: Processes NFS: Network File System(1985) HDFS: Hadoop Distributed File System(2006) Spark Architecture (2010) IPFS: The permanent Web(2014) Future Work

Review : Processes Form the basis of how work gets done How are they organized in a system? Single threaded or Multithreaded Stateful or Stateless Architecture: Clients - Servers, Clusters, Master - Worker, Peer to Peer How communication is done. Code Migration - sending computation to the machine instead of communicating.

NFS: Network File System For accessing remote file systems, transparent to clients. Integrated into the Unix Kernel using a Virtual File System interface. Remote Procedure Call Package for communication. Synchronous calls from client to server, Server is stateless.

HDFS (Hadoop Distributed File System) Abstract cluster’s storage, presenting a single file system. Flexibility(schema-less), Durability, Fault tolerance, Balanced data distribution Relaxed consistency, no locking for concurrent writes to the same file. Split files into chunks and replicate them for fault tolerance. Map-Reduce as a data processing model.

Spark Architecture For processing iterative jobs involving large data sets and interactive queries. Built on top of HDFS and included as a library. Application code is sent to the workers. Master - Worker architecture. Resilient Distributed Datasets. Job Scheduling.

IPFS: The InterPlanetary File System Global file system that can access very large data. Peer to Peer High Throughput for accessing large (Peta Byte) data files. No single point of Failure Peers don’t need to trust each other Inspired by Bit Torrent (file sharing application) and HTTP(protocol)

Future Work Resource allocation can be improved or even automated by monitoring computing capacity on the worker nodes in master-worker architecture. Instead of using one master, where memory becomes a constraint, use multiple masters who collaborate, or use master-less systems (grids, peer-to-peer). Need a more unified interface for accessing underlying data.

References [1] Tanenbaum, Andrew S., and Maarten Van Steen. Distributed systems: principles and paradigms. Prentice-Hall, 2007. [2] Sandberg, Russel, et al. "Design and implementation of the Sun network filesystem." Proceedings of the Summer USENIX conference. 1985. [3] Zaharia, Matei, et al. "Spark: Cluster computing with working sets." HotCloud 10.10-10 (2010): 95. [4] Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (October 2016), 56-65. DOI: https://doi.org/10.1145/2934664 [5] Learning Spark: Lightning-Fast Big Data Analysis: Book by Andy Konwinski, Holden Karau, Matei Zaharia, and Patrick Wendell [6] Benet, Juan. "Ipfs-content addressed, versioned, p2p file system." arXiv preprint arXiv:1407.3561 (2014). [7] https://ipfs.io/ [8] Shvachko, Konstantin, et al. "The hadoop distributed file system." Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on. IEEE, 2010.

Thank you