1/19 Presented by: Maedeh Tashakkorian Supervisor: Hadi Salimi Mazandaran University of Science and Technology February, 2011.

Slides:



Advertisements
Similar presentations
Introduction to Data Center Computing Derek Murray October 2010.
Advertisements

MapReduce.
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Computations have to be distributed !
Felix Halim, Roland H.C. Yap, Yongzheng Wu
Supervisor : Mr. Hadi Salimi Advanced Topics in Information Systems Mazandaran University of Science and Technology February 4, 2011 Survey on Cloud Computing.
Distributed Computations
MapReduce: Simplified Data Processing on Large Clusters Cloud Computing Seminar SEECS, NUST By Dr. Zahid Anwar.
Implementation of Simple Cloud-based Distributed File System Group ID: 4 Baolin Wu, Liushan Yang, Pengyu Ji.
Distributed Computations MapReduce
AN INTRODUCTION TO CLOUD COMPUTING Web, as a Platform…
MapReduce Simplified Data Processing On large Clusters Jeffery Dean and Sanjay Ghemawat.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Supervisor: Hadi Salimi Abdollah Ebrahimi Mazandaran University Of Science & Technology January,
Addition to Networking.  There is no unique and standard definition out there  Cloud Computing is a general term used to describe a new class of network.
Applied Architectures Eunyoung Hwang. Objectives How principles have been used to solve challenging problems How architecture can be used to explain and.
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
MapReduce: Simpliyed Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat To appear in OSDI 2004 (Operating Systems Design and Implementation)
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
MapReduce: Simplified Data Processing on Large Clusters 컴퓨터학과 김정수.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Amazon Web Services BY, RAJESH KANDEPU. Introduction  Amazon Web Services is a collection of remote computing services that together make up a cloud.
Map Reduce: Simplified Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google, Inc. OSDI ’04: 6 th Symposium on Operating Systems Design.
MAP REDUCE : SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS Presented by: Simarpreet Gill.
Lecture 3 CS492 Special Topics in Computer Science Distributed Algorithms and Systems.
SLIDE 1IS 240 – Spring 2013 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval Lecture.
MapReduce How to painlessly process terabytes of data.
Google’s MapReduce Connor Poske Florida State University.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Mass Data Processing Technology on Large Scale Clusters Summer, 2007, Tsinghua University All course material (slides, labs, etc) is licensed under the.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Hung-chih Yang 1, Ali Dasdan 1 Ruey-Lung Hsiao 2, D. Stott Parker 2
SLIDE 1IS 240 – Spring 2013 MapReduce, HBase, and Hive University of California, Berkeley School of Information IS 257: Database Management.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
MapReduce : Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
MapReduce: simplified data processing on large clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce: Simplified Data Processing on Large Cluster Authors: Jeffrey Dean and Sanjay Ghemawat Presented by: Yang Liu, University of Michigan EECS 582.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Lecture #4 Introduction to Data Parallelism and MapReduce CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
MapReduce using Hadoop Jan Krüger … in 30 minutes...
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
15-826: Multimedia Databases and Data Mining
MapReduce Simplied Data Processing on Large Clusters
Map reduce use case Giuseppe Andronico INFN Sez. CT & Consorzio COMETA
Introduction to MapReduce
Presentation transcript:

1/19 Presented by: Maedeh Tashakkorian Supervisor: Hadi Salimi Mazandaran University of Science and Technology February, 2011 } } }...

2/19 Outline Motivation Storage as a Servise (StaaS) Cloud providers Cloud storage challenges Existing Systems and Services MapReduce References Cloud Data Storage - Maedeh Tashakkorian

3/19 Cloud Data Storage - Maedeh Tashakkorian Motivation Greater Resource Agility Respond to business demands more effectively Greater Business Agility Focus on solving business problems, not on infrastructure issues Manage Costs Shift from capital expenditures to operational expenditures

Storage as a Servise (StaaS) A third-party provider rents space on their storage Cost-per-gigabyte-stored or Cost- per-data-transferred model Cloud Data Storage - Maedeh Tashakkorian

Cloud providers Google Docs Web providers Flickr and Picasa YouTube Facebook and MySpace MediaMax and Strongspace Cloud Data Storage - Maedeh Tashakkorian

Cloud storage challenges Security Reliability Outages Theft Cloud Data Storage - Maedeh Tashakkorian

Existing Systems and Services Cloud Data Storage - Maedeh Tashakkorian

8/19 MapReduce What is MapReduce? Examples Execution Overview Fault Tolerance

Cloud Data Storage - Maedeh Tashakkorian What is MapReduce? A programming model Input data is large Want to use 1000s of CPUs User-defined functions simple and powerful interface Automatic parallelization and distribution Fault-tolerance and I/O scheduling Monitoring & status updates MapReduceProvides:MapReduceProvides:

MapReduce Concept Map Perform a function on individual values in a data set to create a new list of values Reduce Combine values in a data set to create a new value Cloud Data Storage - Maedeh Tashakkorian

Examples Distributed GREP Count of URL Access Frequency Reverse Web-Link Graph Inverted Index Distributed Sort Cloud Data Storage - Maedeh Tashakkorian

Execution Overview Cloud Data Storage - Maedeh Tashakkorian

Example for MapReduce Page 1: the weather is good Page 2: today is good Page 3: good weather is good Cloud Data Storage - Maedeh Tashakkorian

Map output Worker 1: – (the 1), (weather 1), (is 1), (good 1). Worker 2: – (today 1), (is 1), (good 1). Worker 3: – (good 1), (weather 1), (is 1), (good 1). Cloud Data Storage - Maedeh Tashakkorian

Reduce Input Worker 1: – (the 1) Worker 2: – (is 1), (is 1), (is 1) Worker 3: – (weather 1), (weather 1) Worker 4: – (today 1) Worker 5: – (good 1), (good 1), (good 1), (good 1) Cloud Data Storage - Maedeh Tashakkorian

Reduce Output Worker 1: – (the 1) Worker 2: – (is 3) Worker 3: – (weather 2) Worker 4: – (today 1) Worker 5: – (good 4) Cloud Data Storage - Maedeh Tashakkorian

Fault Tolerance Worker Failure Master Failure Cloud Data Storage - Maedeh Tashakkorian

18/19 References [1] Wu, J., L. Ping, et al. (2010). Cloud Storage as the Infrastructure of Cloud Computing, IEEE. [2] Velte, T., A. Velte, et al. (2009). Cloud computing: a practical approach, McGraw-Hill Osborne Media. [3] Moreno, J., D. Kossmann, et al. (2010). "A testing framework for cloud storage systems." [4] Jin, C. and R. Buyya (2009). "MapReduce Programming Model for. NET- Based Cloud Computing." Euro-Par 2009 Parallel Processing: [5] DeCandia, G., D. Hastorun, et al. (2007). "Dynamo: amazon's highly available key-value store." ACM SIGOPS Operating Systems Review 41(6): [6] Dean, J. and S. Ghemawat (2008). "MapReduce: Simplified data processing on large clusters." Communications of the ACM 51(1): [7] Chang, F., J. Dean, et al. (2008). "Bigtable: A distributed storage system for structured data." ACM Transactions on Computer Systems (TOCS) 26(2): Cloud Data Storage - Maedeh Tashakkorian

19/19 References (cont’d) [8] (2010). "Amazon Elastic Compute Cloud (Amazon EC2)." Retrieved Jan 29, 2011, from [9](2010). "Amazon Simple Storage Service (Amazon S3)." Retrieved Jan 29, 2011, from [10](2010). "Enterprise Cloud Storage - Nirvanix Storage Delivery Network." Retrieved Jan 29, 2011, from [11](2011). "BigTable - Wikipedia, the free encyclopedia." Retrieved Jan 29, 2011, from [12](2011). "Dedicated Server, Managed Hosting, Web Hosting by Rackspace Hosting." Retrieved Jan29, 2011, from [13](2011). "Product Overview - Google Storage for Developers - Google Code." Retrieved Jan 29, 2011, from [14](2011). "salesforce.com." Retrieved Jan 29, 2011, from Cloud Data Storage - Maedeh Tashakkorian