Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Chinese University, CSE Dept. Distributed Systems / 10 - 1 Distributed Systems Topic 10: Cloud Computing Systems Dr. Michael R. Lyu Computer Science.

Similar presentations


Presentation on theme: "© Chinese University, CSE Dept. Distributed Systems / 10 - 1 Distributed Systems Topic 10: Cloud Computing Systems Dr. Michael R. Lyu Computer Science."— Presentation transcript:

1 © Chinese University, CSE Dept. Distributed Systems / 10 - 1 Distributed Systems Topic 10: Cloud Computing Systems Dr. Michael R. Lyu Computer Science & Engineering Department The Chinese University of Hong Kong

2 © Chinese University, CSE Dept. Distributed Systems / 10 - 2 Outline 1. Cloud Computing Introduction 2. Virtualization Concept 3. MapReduce Computing Paradigm 4. Cloud Computing In Practice (Amazon AWS) 5. Summary

3 © Chinese University, CSE Dept. Distributed Systems / 10 - 3 1 What Is Cloud Computing?  Cloud computing is a hot buzzword recently –More and more companies have claimed their products are cloud computing based  So, what is cloud computing –A new name of an old technology created by marketing people? –A new technology focusing on higher-speed computing? –A new programming paradigm?

4 © Chinese University, CSE Dept. Distributed Systems / 10 - 4 1 What Is Cloud Computing?  Let us see what cloud computing is by its economic drive –See an example on hosting an Internet service »e.g., a website CUHK co. ltd. Server

5 © Chinese University, CSE Dept. Distributed Systems / 10 - 5 1.1 Hosting Computing Systems CUHK co. ltd. A single server is enough for small business A small amount of budget

6 © Chinese University, CSE Dept. Distributed Systems / 10 - 6 1.1 Hosting Computing Systems CUHK co. ltd. More servers are needed if you have more clients Cost more manpower and money to provide the service

7 © Chinese University, CSE Dept. Distributed Systems / 10 - 7 1.1 Hosting Computing Systems CUHK co. ltd. A huge data center is needed if you have a huge number of clients Cost a huge amount of money to provide the service

8 © Chinese University, CSE Dept. Distributed Systems / 10 - 8 1.1 Hosting Computing Systems CUHK co. ltd. A huge data center is needed if you have a huge number of clients Cost a huge amount of money to provide the service

9 © Chinese University, CSE Dept. Distributed Systems / 10 - 9 1.1 Hosting Computing Systems  The number of concurrent users is dynamic –You need to prepare for the peak number »Otherwise, bad user experience! –So you have to pay for the servers no matter whether you are using them or not »The money you spend to buy the hardware »The salary for the maintainers »The electricity (including that for air conditioning) »Other maintenance costs, e.g., repairing, upgrading, renting a room for the servers

10 © Chinese University, CSE Dept. Distributed Systems / 10 - 10 1.2 Cloud Computing Motivations  How to make it more economic when hosting computing systems CUHK co. ltd. Server I wish this cloud can really host my system so that I can provide a scalable, on-demand service

11 © Chinese University, CSE Dept. Distributed Systems / 10 - 11 1.2 Cloud Computing Introduction  Hosting computing systems in a scalable, on- demand way –To be economic!  Cloud computing –An emerging computing concept that approaches such a goal –Based on a lot of distributed system techniques »Service-oriented system architecture »Virtualization technology »Distributed computing paradigm

12 © Chinese University, CSE Dept. Distributed Systems / 10 - 12 1.3 Pay As You Go  The core of cloud computing is computing/ storage outsourcing –To another company –To a dedicating unit in the same organization vs Pay for exactly what you’ve used Pay as you go!! Cost Down!!!

13 © Chinese University, CSE Dept. Distributed Systems / 10 - 13 1.3 Outsourcing Electronics industry Brand vendor Manufacturer Manufacturing outsourcing Different manufacturing requirements in low season and high season # of product lines # of workers Service vendor Cloud computing provider Computing and storage outsourcing Different service requirements in different times # of hardware machines # of maintainers Cost down!!! Services for end users Cloud computing

14 © Chinese University, CSE Dept. Distributed Systems / 10 - 14 1.4 Cloud Computing System Architecture Computing System Application Platform Infrastructure Hardware OS, middleware User software, API

15 © Chinese University, CSE Dept. Distributed Systems / 10 - 15 1.4 Cloud Computing System Architecture Application Platform Infrastructure SaaS: Software as a service PaaS: Platform as a service Provided by the cloud computing providers IaaS: Infrastructure as a service A cloud user’s own application and OS A cloud user’s own application Three cloud computing stacks

16 © Chinese University, CSE Dept. Distributed Systems / 10 - 16 1.5 Cloud Computing Industry  Amazon Elastic Compute Cloud (EC2)  Microsoft's Windows Azure Platform  Google App Engine  Other small startups: Heroku & Engine Yard  And more (and more)…

17 © Chinese University, CSE Dept. Distributed Systems / 10 - 17 1.5 Cloud Computing Industry  Amazon Elastic Compute Cloud (EC2) –Xen-based virtual computing environment »A user can run Linux-based applications –IaaS: A user can control nearly the entire software stack, from the kernel upwards. –Provides low level of virtualization »Raw CPU cycles, block-device storage, IP-level connectivity  Microsoft's Windows Azure Platform  Google App Engine  Other small startups: Heroku & Engine Yard  And more (and more)…

18 © Chinese University, CSE Dept. Distributed Systems / 10 - 18 1.5 Cloud Computing Industry  Amazon Elastic Compute Cloud (EC2)  Microsoft's Windows Azure Platform –PaaS: Use Windows Azure Hypervisor as the infrastructure, and use the.NET framework as the application container –Support general-purpose computing –Users can choose language, but cannot control the underlying operating system or runtime  Google App Engine  Other small startups: Heroku & Engine Yard  And more (and more)…

19 © Chinese University, CSE Dept. Distributed Systems / 10 - 19 1.5 Cloud Computing Industry  Amazon Elastic Compute Cloud (EC2)  Microsoft's Windows Azure Platform  Google App Engine –Support APIs for the Google Datastore, Google Accounts, URL fetch, image manipulation, and email services, etc. –SaaS: Application domain-specific platforms »Not suitable for general-purpose computing  Other small startups: Heroku & Engine Yard  And more (and more)…

20 © Chinese University, CSE Dept. Distributed Systems / 10 - 20 1.5 Cloud Computing Industry  Amazon Elastic Compute Cloud (EC2)  Microsoft's Windows Azure Platform  Google App Engine  Other small startups: Heroku & Engine Yard –Based on Ruby on Rails –PaaS  And more (and more)…

21 © Chinese University, CSE Dept. Distributed Systems / 10 - 21 2 Virtualization Concept  Virtualization refers to the act of creating a virtual (rather than actual) version of –computer hardware platform –operating system (OS) –storage device –database server –computer network resources ♦We can virtualize physical servers to support all applications running over them ♦Virtualization of the key to Cloud Computing

22 © Chinese University, CSE Dept. Distributed Systems / 10 - 22 2.1 The Traditional Server Concept Web Server Windows IIS App Server Linux Glassfish DB Server Linux MySQL EMail Windows Exchange

23 © Chinese University, CSE Dept. Distributed Systems / 10 - 23 2.1 The Traditional Server Concept  System administrators often talk about servers as a whole unit –Including the hardware, the OS, the storage, and the applications  Servers are often referred to by their functions –e.g., the Web servers, the SQL servers –If the SQL servers are overloaded, the administrator must add in a new one

24 © Chinese University, CSE Dept. Distributed Systems / 10 - 24 2.1 If Something Going Wrong… Web Server Windows IIS App Server DOWN! DB Server Linux MySQL EMail Windows Exchange

25 © Chinese University, CSE Dept. Distributed Systems / 10 - 25 2.1 The Traditional Server Concept  If server failure is experienced –Unless there are redundant servers, the whole service is down  The administrators can implement clusters of servers –Make the service more fault tolerant –However, even clusters are not scalable –And, resource utilization is typically low

26 © Chinese University, CSE Dept. Distributed Systems / 10 - 26 2.2 The Virtual Server Concept Virtual Machine Monitor (VMM) layer between Guest OS and hardware

27 © Chinese University, CSE Dept. Distributed Systems / 10 - 27 2.2 The Virtual Server Concept  Virtual servers encapsulate the service software away from the hardware –A virtual server can be hosted in one or more hardware machines –One hardware machine can host more than one virtual server x86 Architecture VMM (Virtual Machine Monitor) Server 1 Guest OS Server 2 Guest OS Clustering Service Console Intercepts hardware requests

28 © Chinese University, CSE Dept. Distributed Systems / 10 - 28 2.2 Virtualized Vs. Traditional Hardware Operating System App Traditional Stack Hardware OS App Hypervisor OS Virtualized Stack

29 © Chinese University, CSE Dept. Distributed Systems / 10 - 29 2.2 The Virtual Server Concept  A flexible way to host computing systems –Virtual servers are just files in the storage »They can be migrated from one machine to another easily –Hardware machines can be removed and introduced conveniently –Administrators can adjust the amount of resources allocated to each virtual server »Accommodate to the service requirements

30 © Chinese University, CSE Dept. Distributed Systems / 10 - 30 2.3 Virtualization Technique Supports  Software implementations –Commercial software »e.g., VMWare –Open-source software »e.g., XEN, VirtualBox  Hardware support –Intel VT (Intel Virtualization Technology) –AMD-V (AMD-Virtualization)  Virtualization is a well-established technique

31 © Chinese University, CSE Dept. Distributed Systems / 10 - 31 3 Distributed Computing with MapReduce  What is MapReduce? –A software framework introduced by Google –For distributed processing large data sets on a cluster of computers –MapReduce libraries have been extensively implemented »C++, C#, Erlang, Java, Ocaml, Python, Ruby, F#, R, etc

32 © Chinese University, CSE Dept. Distributed Systems / 10 - 32 3.1 Example: Distributed Grep grep Very big data set matches Split data matches grep cat All matches

33 © Chinese University, CSE Dept. Distributed Systems / 10 - 33 3.1 Example: Distributed Word Count count Very big data set count Split data count merge Merged count

34 © Chinese University, CSE Dept. Distributed Systems / 10 - 34 3.2 MapReduce Concept count Very big data set count Split data count merge Results MAPMAP REDUCEREDUCE Partitioning Function ♦Map : –A master node takes the input, chops it up into smaller sub- problems, and distributes them to worker nodes –A worker node processes a sub- problem, and passes the answer back to its master node ♦Reduce : –The master node takes the answers to all the sub-problems and combines them to get the answer to the problem it was originally trying to solve.

35 © Chinese University, CSE Dept. Distributed Systems / 10 - 35 3.2 Logical View of MapReduce  Map is defined with respect to data structured in (key, value) pairs  Map is applied in parallel to every item in the input dataset  Map takes one pair of data with a type in one data domain, and returns a list of pairs in a different domain –Map(k1,v1)  list(k2,v2)  Then, the MapReduce framework collects all pairs with the same key from all lists –Create one group for each one of the different generated keys

36 © Chinese University, CSE Dept. Distributed Systems / 10 - 36 3.2 Map Example  Example: counting the number of occurrences of each word in a large collection of documents –name: name of a document in the document collection –document: document contents consisting of a set of words –Applied to each document –Each invocation gets a list of (word, “1”) pairs –MapReduce then group the results according to the key (word in this example)

37 © Chinese University, CSE Dept. Distributed Systems / 10 - 37 3.2 Logical View of MapReduce  Reduce is also defined with respect to data structured in (key, value) pairs  Reduce is applied in parallel to each group generated in the Map step  Reduce produces a collection of values in the same domain –Reduce(k2, list (v2))  list(v3)  The MapReduce framework in this way transforms a list of (key, value) pairs into a list of values

38 © Chinese University, CSE Dept. Distributed Systems / 10 - 38 3.2 Reduce Example  Example: counting the number of occurrences of each word in a large collection of documents –word: the word being processed –partialCount: a list where each item is produced by a Map call –So for each word, Reduce gets its occurrence count

39 © Chinese University, CSE Dept. Distributed Systems / 10 - 39 3.3 When To Use MapReduce?  You have a cluster of computers  You are working with large dataset  You are working with independent data (or assumed)  The data processing can be casted into Map and Reduce operations

40 © Chinese University, CSE Dept. Distributed Systems / 10 - 40 3.4 MapReduce Implementations  MapReduce by Google Inc. –Centric to Google’s searching engine »Generate Google's index of the World Wide Web  Open-source implementations –Hadoop »Yahoo! Search Webmap is a Hadoop application to produce data used in every Yahoo! Web search query –And many others »Disco, Skynet, …

41 © Chinese University, CSE Dept. Distributed Systems / 10 - 41 4 Cloud Computing In Practice (Amazon AWS)  What is Amazon AWS (Amazon Web Service) –AWS is a collection of remote computing services that together make up a cloud computing platform, offered over the Internet by Amazon.com.

42 © Chinese University, CSE Dept. Distributed Systems / 10 - 42 4.1 AWS Architecture

43 © Chinese University, CSE Dept. Distributed Systems / 10 - 43 4.1 AWS Functional Ecosphere

44 © Chinese University, CSE Dept. Distributed Systems / 10 - 44 4.1 How AWS Works? A Film Rending Case

45 © Chinese University, CSE Dept. Distributed Systems / 10 - 45 4.2 EC2 Instances  AMI Instances –General purpose instances (M1 or M3) –Compute-optimized instances (C1 or CC2 or C3) –GPU instances (G2 or CG1) –Memory-optimized instances (M2 or CR1) –Storage-optimized instances (HS1 or I2 or HI1) –Micro Instances  Amazon Machine Image (AMI)

46 © Chinese University, CSE Dept. Distributed Systems / 10 - 46 4.2 EC2 Pricing Strategy

47 © Chinese University, CSE Dept. Distributed Systems / 10 - 47 4.2 EC2 Management From Command Line  Sign up for AWS  Apply AWS security credentials  Download and Install EC2 command line tools  Set up system environment variables for EC2  Set up.profile configuration file  Check available AMIs  Generate key pair  Initialize and startup selected AMI  Authorize accesses to ports and connect to AMI instance  Terminate the instance when work is done

48 © Chinese University, CSE Dept. Distributed Systems / 10 - 48 4.3 AWS S3  S3 = Simple Storage Service  Guaranteed to be reliable, secure and scalable  Simple {Key, Value} storage  Keys are stored within buckets  Default Storage Mechanism for AWS

49 © Chinese University, CSE Dept. Distributed Systems / 10 - 49 4.4 AWS SQS  SQS = Simple Queue System –Amazon SQS offers reliable and scalable hosted queues for storing messages as they travel between components or nodes in distributed systems.

50 © Chinese University, CSE Dept. Distributed Systems / 10 - 50 4.6 Programming Support for AWS  Official Amazon Libraries for Java  Unofficial Libraries for.Net, Ruby and Perl  AWS4C Libraries for C/C++/Objective C  Boto Libraries for Python –Most popular Amazon AWS library to support the full breadth and depth of Amazon Web Services –Support for other public services such as Google Storage in addition to private cloud systems like Eucalyptus, OpenStack and Open Nebula.

51 © Chinese University, CSE Dept. Distributed Systems / 10 - 51 5 Summary  Cloud computing can provide elastic Internet-based services –Host computing systems in an on-demand, scalable manner –For cost-down purpose  Cloud computing systems are typically implemented as service-based systems: IaaS, PaaS, SaaS.  Virtualization technology is key to implement cloud computing systems  MapReduce is an emerging computing paradigm that can be implemented with cloud computing systems  Amazon AWS as commercial cloud computing service

52 © Chinese University, CSE Dept. Distributed Systems / 10 - 52 References [1] M. Armbrust et al, A view of cloud computing, Communications of the ACM, 53:4, pp.50-58, 2010. http://cacm.acm.org/magazines/2010/4/81493-a-view- of-cloud-computing/fulltext [2] L. Vaquero, A break in the clouds: towards a cloud definition, ACM SIGCOMM Computer Communication Review, 39:1, 2009. http://ccr.sigcomm.org/drupal/files/p50-v39n1l-vaqueroA.pdf [3] J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, in Proc. of the 6th USENIX Symposium on Operating Systems Design & Implementation, 2004. http://labs.google.com/papers/mapreduce-osdi04.pdf [4] Paul Barham, Xen and the art of virtualization, in Proc. of the 19th ACM Symposium on Operating Systems Principles, 2003. http://www.cl.cam.ac.uk/research/srg/netos/papers/2003-xensosp.pdf

53 © Chinese University, CSE Dept. Distributed Systems / 10 - 53 Links [1] Amazon Elastic Compute Cloud (Amazon EC2), http://aws.amazon.com/ec2/ [2] Windows Azure,Microsoft’s Cloud Service Platform, http://www.microsoft.com/windowsazure/ [3] Google Apps. http://www.google.com/apps/ [4] Heroku, Ruby Cloud Platform as a Service. http://heroku.com/ [5] Engine Yard: The Ruby Cloud. http://www.engineyard.com/ [6] VMware Virtualization Software for Desktops, Servers & Virtual Machines for Public and Private Cloud Solutions. http://www.vmware.com/ [7] Xen Projects. http://www.xen.org/ [8] Hadoop. http://hadoop.apache.org/ [9] Disco Project. http://discoproject.org/ [10] Skynet, A Ruby MapReduce Framework. http://skynet.rubyforge.org/


Download ppt "© Chinese University, CSE Dept. Distributed Systems / 10 - 1 Distributed Systems Topic 10: Cloud Computing Systems Dr. Michael R. Lyu Computer Science."

Similar presentations


Ads by Google