Download presentation
Presentation is loading. Please wait.
Published byLillian Patrick Modified over 8 years ago
1
© Chinese University, CSE Dept. Distributed Systems / 10 - 1 Distributed Systems Topic 10: Cloud Computing Systems Dr. Michael R. Lyu Computer Science & Engineering Department The Chinese University of Hong Kong
2
© Chinese University, CSE Dept. Distributed Systems / 10 - 2 Outline 1. Cloud Computing Introduction 2. Virtualization Concept 3. MapReduce Computing Paradigm 4. Cloud Computing In Practice (Amazon AWS) 5. Summary
3
© Chinese University, CSE Dept. Distributed Systems / 10 - 3 1 What Is Cloud Computing? Cloud computing is a hot buzzword recently –More and more companies have claimed their products are cloud computing based So, what is cloud computing –A new name of an old technology created by marketing people? –A new technology focusing on higher-speed computing? –A new programming paradigm?
4
© Chinese University, CSE Dept. Distributed Systems / 10 - 4 1 What Is Cloud Computing? Let us see what cloud computing is by its economic drive –See an example on hosting an Internet service »e.g., a website CUHK co. ltd. Server
5
© Chinese University, CSE Dept. Distributed Systems / 10 - 5 1.1 Hosting Computing Systems CUHK co. ltd. A single server is enough for small business A small amount of budget
6
© Chinese University, CSE Dept. Distributed Systems / 10 - 6 1.1 Hosting Computing Systems CUHK co. ltd. More servers are needed if you have more clients Cost more manpower and money to provide the service
7
© Chinese University, CSE Dept. Distributed Systems / 10 - 7 1.1 Hosting Computing Systems CUHK co. ltd. A huge data center is needed if you have a huge number of clients Cost a huge amount of money to provide the service
8
© Chinese University, CSE Dept. Distributed Systems / 10 - 8 1.1 Hosting Computing Systems CUHK co. ltd. A huge data center is needed if you have a huge number of clients Cost a huge amount of money to provide the service
9
© Chinese University, CSE Dept. Distributed Systems / 10 - 9 1.1 Hosting Computing Systems The number of concurrent users is dynamic –You need to prepare for the peak number »Otherwise, bad user experience! –So you have to pay for the servers no matter whether you are using them or not »The money you spend to buy the hardware »The salary for the maintainers »The electricity (including that for air conditioning) »Other maintenance costs, e.g., repairing, upgrading, renting a room for the servers
10
© Chinese University, CSE Dept. Distributed Systems / 10 - 10 1.2 Cloud Computing Motivations How to make it more economic when hosting computing systems CUHK co. ltd. Server I wish this cloud can really host my system so that I can provide a scalable, on-demand service
11
© Chinese University, CSE Dept. Distributed Systems / 10 - 11 1.2 Cloud Computing Introduction Hosting computing systems in a scalable, on- demand way –To be economic! Cloud computing –An emerging computing concept that approaches such a goal –Based on a lot of distributed system techniques »Service-oriented system architecture »Virtualization technology »Distributed computing paradigm
12
© Chinese University, CSE Dept. Distributed Systems / 10 - 12 1.3 Pay As You Go The core of cloud computing is computing/ storage outsourcing –To another company –To a dedicating unit in the same organization vs Pay for exactly what you’ve used Pay as you go!! Cost Down!!!
13
© Chinese University, CSE Dept. Distributed Systems / 10 - 13 1.3 Outsourcing Electronics industry Brand vendor Manufacturer Manufacturing outsourcing Different manufacturing requirements in low season and high season # of product lines # of workers Service vendor Cloud computing provider Computing and storage outsourcing Different service requirements in different times # of hardware machines # of maintainers Cost down!!! Services for end users Cloud computing
14
© Chinese University, CSE Dept. Distributed Systems / 10 - 14 1.4 Cloud Computing System Architecture Computing System Application Platform Infrastructure Hardware OS, middleware User software, API
15
© Chinese University, CSE Dept. Distributed Systems / 10 - 15 1.4 Cloud Computing System Architecture Application Platform Infrastructure SaaS: Software as a service PaaS: Platform as a service Provided by the cloud computing providers IaaS: Infrastructure as a service A cloud user’s own application and OS A cloud user’s own application Three cloud computing stacks
16
© Chinese University, CSE Dept. Distributed Systems / 10 - 16 1.5 Cloud Computing Industry Amazon Elastic Compute Cloud (EC2) Microsoft's Windows Azure Platform Google App Engine Other small startups: Heroku & Engine Yard And more (and more)…
17
© Chinese University, CSE Dept. Distributed Systems / 10 - 17 1.5 Cloud Computing Industry Amazon Elastic Compute Cloud (EC2) –Xen-based virtual computing environment »A user can run Linux-based applications –IaaS: A user can control nearly the entire software stack, from the kernel upwards. –Provides low level of virtualization »Raw CPU cycles, block-device storage, IP-level connectivity Microsoft's Windows Azure Platform Google App Engine Other small startups: Heroku & Engine Yard And more (and more)…
18
© Chinese University, CSE Dept. Distributed Systems / 10 - 18 1.5 Cloud Computing Industry Amazon Elastic Compute Cloud (EC2) Microsoft's Windows Azure Platform –PaaS: Use Windows Azure Hypervisor as the infrastructure, and use the.NET framework as the application container –Support general-purpose computing –Users can choose language, but cannot control the underlying operating system or runtime Google App Engine Other small startups: Heroku & Engine Yard And more (and more)…
19
© Chinese University, CSE Dept. Distributed Systems / 10 - 19 1.5 Cloud Computing Industry Amazon Elastic Compute Cloud (EC2) Microsoft's Windows Azure Platform Google App Engine –Support APIs for the Google Datastore, Google Accounts, URL fetch, image manipulation, and email services, etc. –SaaS: Application domain-specific platforms »Not suitable for general-purpose computing Other small startups: Heroku & Engine Yard And more (and more)…
20
© Chinese University, CSE Dept. Distributed Systems / 10 - 20 1.5 Cloud Computing Industry Amazon Elastic Compute Cloud (EC2) Microsoft's Windows Azure Platform Google App Engine Other small startups: Heroku & Engine Yard –Based on Ruby on Rails –PaaS And more (and more)…
21
© Chinese University, CSE Dept. Distributed Systems / 10 - 21 2 Virtualization Concept Virtualization refers to the act of creating a virtual (rather than actual) version of –computer hardware platform –operating system (OS) –storage device –database server –computer network resources ♦We can virtualize physical servers to support all applications running over them ♦Virtualization of the key to Cloud Computing
22
© Chinese University, CSE Dept. Distributed Systems / 10 - 22 2.1 The Traditional Server Concept Web Server Windows IIS App Server Linux Glassfish DB Server Linux MySQL EMail Windows Exchange
23
© Chinese University, CSE Dept. Distributed Systems / 10 - 23 2.1 The Traditional Server Concept System administrators often talk about servers as a whole unit –Including the hardware, the OS, the storage, and the applications Servers are often referred to by their functions –e.g., the Web servers, the SQL servers –If the SQL servers are overloaded, the administrator must add in a new one
24
© Chinese University, CSE Dept. Distributed Systems / 10 - 24 2.1 If Something Going Wrong… Web Server Windows IIS App Server DOWN! DB Server Linux MySQL EMail Windows Exchange
25
© Chinese University, CSE Dept. Distributed Systems / 10 - 25 2.1 The Traditional Server Concept If server failure is experienced –Unless there are redundant servers, the whole service is down The administrators can implement clusters of servers –Make the service more fault tolerant –However, even clusters are not scalable –And, resource utilization is typically low
26
© Chinese University, CSE Dept. Distributed Systems / 10 - 26 2.2 The Virtual Server Concept Virtual Machine Monitor (VMM) layer between Guest OS and hardware
27
© Chinese University, CSE Dept. Distributed Systems / 10 - 27 2.2 The Virtual Server Concept Virtual servers encapsulate the service software away from the hardware –A virtual server can be hosted in one or more hardware machines –One hardware machine can host more than one virtual server x86 Architecture VMM (Virtual Machine Monitor) Server 1 Guest OS Server 2 Guest OS Clustering Service Console Intercepts hardware requests
28
© Chinese University, CSE Dept. Distributed Systems / 10 - 28 2.2 Virtualized Vs. Traditional Hardware Operating System App Traditional Stack Hardware OS App Hypervisor OS Virtualized Stack
29
© Chinese University, CSE Dept. Distributed Systems / 10 - 29 2.2 The Virtual Server Concept A flexible way to host computing systems –Virtual servers are just files in the storage »They can be migrated from one machine to another easily –Hardware machines can be removed and introduced conveniently –Administrators can adjust the amount of resources allocated to each virtual server »Accommodate to the service requirements
30
© Chinese University, CSE Dept. Distributed Systems / 10 - 30 2.3 Virtualization Technique Supports Software implementations –Commercial software »e.g., VMWare –Open-source software »e.g., XEN, VirtualBox Hardware support –Intel VT (Intel Virtualization Technology) –AMD-V (AMD-Virtualization) Virtualization is a well-established technique
31
© Chinese University, CSE Dept. Distributed Systems / 10 - 31 3 Distributed Computing with MapReduce What is MapReduce? –A software framework introduced by Google –For distributed processing large data sets on a cluster of computers –MapReduce libraries have been extensively implemented »C++, C#, Erlang, Java, Ocaml, Python, Ruby, F#, R, etc
32
© Chinese University, CSE Dept. Distributed Systems / 10 - 32 3.1 Example: Distributed Grep grep Very big data set matches Split data matches grep cat All matches
33
© Chinese University, CSE Dept. Distributed Systems / 10 - 33 3.1 Example: Distributed Word Count count Very big data set count Split data count merge Merged count
34
© Chinese University, CSE Dept. Distributed Systems / 10 - 34 3.2 MapReduce Concept count Very big data set count Split data count merge Results MAPMAP REDUCEREDUCE Partitioning Function ♦Map : –A master node takes the input, chops it up into smaller sub- problems, and distributes them to worker nodes –A worker node processes a sub- problem, and passes the answer back to its master node ♦Reduce : –The master node takes the answers to all the sub-problems and combines them to get the answer to the problem it was originally trying to solve.
35
© Chinese University, CSE Dept. Distributed Systems / 10 - 35 3.2 Logical View of MapReduce Map is defined with respect to data structured in (key, value) pairs Map is applied in parallel to every item in the input dataset Map takes one pair of data with a type in one data domain, and returns a list of pairs in a different domain –Map(k1,v1) list(k2,v2) Then, the MapReduce framework collects all pairs with the same key from all lists –Create one group for each one of the different generated keys
36
© Chinese University, CSE Dept. Distributed Systems / 10 - 36 3.2 Map Example Example: counting the number of occurrences of each word in a large collection of documents –name: name of a document in the document collection –document: document contents consisting of a set of words –Applied to each document –Each invocation gets a list of (word, “1”) pairs –MapReduce then group the results according to the key (word in this example)
37
© Chinese University, CSE Dept. Distributed Systems / 10 - 37 3.2 Logical View of MapReduce Reduce is also defined with respect to data structured in (key, value) pairs Reduce is applied in parallel to each group generated in the Map step Reduce produces a collection of values in the same domain –Reduce(k2, list (v2)) list(v3) The MapReduce framework in this way transforms a list of (key, value) pairs into a list of values
38
© Chinese University, CSE Dept. Distributed Systems / 10 - 38 3.2 Reduce Example Example: counting the number of occurrences of each word in a large collection of documents –word: the word being processed –partialCount: a list where each item is produced by a Map call –So for each word, Reduce gets its occurrence count
39
© Chinese University, CSE Dept. Distributed Systems / 10 - 39 3.3 When To Use MapReduce? You have a cluster of computers You are working with large dataset You are working with independent data (or assumed) The data processing can be casted into Map and Reduce operations
40
© Chinese University, CSE Dept. Distributed Systems / 10 - 40 3.4 MapReduce Implementations MapReduce by Google Inc. –Centric to Google’s searching engine »Generate Google's index of the World Wide Web Open-source implementations –Hadoop »Yahoo! Search Webmap is a Hadoop application to produce data used in every Yahoo! Web search query –And many others »Disco, Skynet, …
41
© Chinese University, CSE Dept. Distributed Systems / 10 - 41 4 Cloud Computing In Practice (Amazon AWS) What is Amazon AWS (Amazon Web Service) –AWS is a collection of remote computing services that together make up a cloud computing platform, offered over the Internet by Amazon.com.
42
© Chinese University, CSE Dept. Distributed Systems / 10 - 42 4.1 AWS Architecture
43
© Chinese University, CSE Dept. Distributed Systems / 10 - 43 4.1 AWS Functional Ecosphere
44
© Chinese University, CSE Dept. Distributed Systems / 10 - 44 4.1 How AWS Works? A Film Rending Case
45
© Chinese University, CSE Dept. Distributed Systems / 10 - 45 4.2 EC2 Instances AMI Instances –General purpose instances (M1 or M3) –Compute-optimized instances (C1 or CC2 or C3) –GPU instances (G2 or CG1) –Memory-optimized instances (M2 or CR1) –Storage-optimized instances (HS1 or I2 or HI1) –Micro Instances Amazon Machine Image (AMI)
46
© Chinese University, CSE Dept. Distributed Systems / 10 - 46 4.2 EC2 Pricing Strategy
47
© Chinese University, CSE Dept. Distributed Systems / 10 - 47 4.2 EC2 Management From Command Line Sign up for AWS Apply AWS security credentials Download and Install EC2 command line tools Set up system environment variables for EC2 Set up.profile configuration file Check available AMIs Generate key pair Initialize and startup selected AMI Authorize accesses to ports and connect to AMI instance Terminate the instance when work is done
48
© Chinese University, CSE Dept. Distributed Systems / 10 - 48 4.3 AWS S3 S3 = Simple Storage Service Guaranteed to be reliable, secure and scalable Simple {Key, Value} storage Keys are stored within buckets Default Storage Mechanism for AWS
49
© Chinese University, CSE Dept. Distributed Systems / 10 - 49 4.4 AWS SQS SQS = Simple Queue System –Amazon SQS offers reliable and scalable hosted queues for storing messages as they travel between components or nodes in distributed systems.
50
© Chinese University, CSE Dept. Distributed Systems / 10 - 50 4.6 Programming Support for AWS Official Amazon Libraries for Java Unofficial Libraries for.Net, Ruby and Perl AWS4C Libraries for C/C++/Objective C Boto Libraries for Python –Most popular Amazon AWS library to support the full breadth and depth of Amazon Web Services –Support for other public services such as Google Storage in addition to private cloud systems like Eucalyptus, OpenStack and Open Nebula.
51
© Chinese University, CSE Dept. Distributed Systems / 10 - 51 5 Summary Cloud computing can provide elastic Internet-based services –Host computing systems in an on-demand, scalable manner –For cost-down purpose Cloud computing systems are typically implemented as service-based systems: IaaS, PaaS, SaaS. Virtualization technology is key to implement cloud computing systems MapReduce is an emerging computing paradigm that can be implemented with cloud computing systems Amazon AWS as commercial cloud computing service
52
© Chinese University, CSE Dept. Distributed Systems / 10 - 52 References [1] M. Armbrust et al, A view of cloud computing, Communications of the ACM, 53:4, pp.50-58, 2010. http://cacm.acm.org/magazines/2010/4/81493-a-view- of-cloud-computing/fulltext [2] L. Vaquero, A break in the clouds: towards a cloud definition, ACM SIGCOMM Computer Communication Review, 39:1, 2009. http://ccr.sigcomm.org/drupal/files/p50-v39n1l-vaqueroA.pdf [3] J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, in Proc. of the 6th USENIX Symposium on Operating Systems Design & Implementation, 2004. http://labs.google.com/papers/mapreduce-osdi04.pdf [4] Paul Barham, Xen and the art of virtualization, in Proc. of the 19th ACM Symposium on Operating Systems Principles, 2003. http://www.cl.cam.ac.uk/research/srg/netos/papers/2003-xensosp.pdf
53
© Chinese University, CSE Dept. Distributed Systems / 10 - 53 Links [1] Amazon Elastic Compute Cloud (Amazon EC2), http://aws.amazon.com/ec2/ [2] Windows Azure,Microsoft’s Cloud Service Platform, http://www.microsoft.com/windowsazure/ [3] Google Apps. http://www.google.com/apps/ [4] Heroku, Ruby Cloud Platform as a Service. http://heroku.com/ [5] Engine Yard: The Ruby Cloud. http://www.engineyard.com/ [6] VMware Virtualization Software for Desktops, Servers & Virtual Machines for Public and Private Cloud Solutions. http://www.vmware.com/ [7] Xen Projects. http://www.xen.org/ [8] Hadoop. http://hadoop.apache.org/ [9] Disco Project. http://discoproject.org/ [10] Skynet, A Ruby MapReduce Framework. http://skynet.rubyforge.org/
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.