Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 1 Book: Hadoop in Action by Chuck Lam Online course – “Cloud Computing Concepts” lecture notes by Indranil Gupta.

Similar presentations


Presentation on theme: "Lecture 1 Book: Hadoop in Action by Chuck Lam Online course – “Cloud Computing Concepts” lecture notes by Indranil Gupta."— Presentation transcript:

1 Lecture 1 Book: Hadoop in Action by Chuck Lam Online course – “Cloud Computing Concepts” lecture notes by Indranil Gupta

2 Content Introduction Clouds MapReduce Understanding Hadoop and MapReduce

3 Gradation policy Attendance – 10% Quizzes – 20% Midterm – 20% Assignments – 20% Final – 30% TOTAL - 100 points

4 Many Cloud Providers AWS: Amazon Web Services EC2: Elastic Compute Cloud S3: Simple Storage Service EBS: Elastic Block Storage Microsoft Azure Google Compute Enginr Rightscale, Salesforce, EMC, Gigaspaces, 10gen, Datastax, Oracle, VMWare, Yahoo, Cloudera And many, many more!

5 Two Categories of Clouds Can be either a (i) public cloud, or (ii) private cloud Private clouds are accessible only to company employees Public clouds provide service to any paying customer: Amazon S3(Simple Storage Service) Amazon EC2(Elastic Compute Cloud) Google App Engine/Compute Engine

6 What is a Cloud? It’s a cluster! It’s a supercomputer! It’s a datastore! It’s a Superman! None of the above All of the above Cloud = Lots of storage + computing cycles nearby

7 What is a Cloud? A single-site cloud (aka “datacenter”) consists of Compute nodes (grouped into racks) Switches, connecting racks A network topology, e.g. hierarchical Storage nodes connected to network Front-end for submitting jobs and receiving client requests Software services A geographically distributed cloud consists of Multiple such sites Each site perhaps with different structure and services

8 A Cloudy history of Time

9 On-demand Access:*aaS On-demand: renting a cab vs. renting a car or buying one HaaS: Hardware as a Service Access to barebones hardware machines. Not always a good idea because of security risks IaaS: Infrastructure as a Service Access to flexible computing and storage infrastructure. Ex: Amazon Web Services (AWS: EC2 and S3) PaaS: Platform as a Service Access to flexible computing and storage infrastructure, coupled with a software platform SaaS: Software as a Service Access to software services(Service Oriented Architectures) Ex: Google docs, MS office on demand

10 A Cloud... A cloud consists of Hundreds to thousands of machines in a datacenter (server side) Thousands to millions of machines accessing these services (client side) Servers communicate amongst one another Clients communicate with servers Clients also communicate with each other

11 A Cloud... IS a Distributed System Servers communicate amongst one another -> Distributed System Essentially a cluster! Clients communicate with servers Also a distributed system! Clients may also communicate with each other In peer-to-peer systems like BitTorrent Also a distributed system!

12 Four Features of Clouds = All Distributed Systems Features! I. Massive Scale: many servers II. On-demand nature –access (multiple) servers anywhere III. Data-Intensive Nature – lots of data => need a cluster (multiple machines) to store IV. New Cloud Programming Paradigms – Hadoop/Mapreduce, NoSQL all need clusters

13 Distributed System = Many Processes Sending and Receiving Messages

14 Many Challenges Abound... Failures : no longer the exception, but rather a norm Scalability: 1000s of machines, Terabytes of data Asynchrony : clock skew and clock drift Concurrency : 1000s of machines interacting with each other accessing the same data...

15 Hadoop Doug Cutting saw an opportunity and led the charge to develop an open source version of this MapReduce system called Hadoop. Today, Hadoop is a core part of the computing infrastructure for many web companies, such as Yahoo, Facebook, LinkedIn, and Twitter. An effective programmer, today, must have knowledge of relational databases, networking, and security, all of which were considered optional skills a couple decades ago. Similarly, basic understanding of distributed data processing will soon become an essential part of every programmer’s toolbox.

16 What is MapReduce

17 Map

18 Reduce

19

20 Thank You


Download ppt "Lecture 1 Book: Hadoop in Action by Chuck Lam Online course – “Cloud Computing Concepts” lecture notes by Indranil Gupta."

Similar presentations


Ads by Google