Download presentation
Presentation is loading. Please wait.
Published byJaniya Hollin Modified over 9 years ago
1
© 2013 A. Haeberlen, Z. Ives Cloud Storage & Case Studies NETS 212: Scalable & Cloud Computing Fall 2014 Z. Ives University of Pennsylvania 1
2
© 2013 A. Haeberlen, Z. Ives Specialized KVS Cloud KVS are often specialized for a particular tradeoff or usage scenario Example: Amazon’s solutions Simple Storage Service (S3): large objects – files, virtual machines, etc. assumes objects change infrequently objects are opaque to the storage system SimpleDB (old), DynamoDB (newer replacement): small objects – Java objects, records, etc. generally updated more frequently; greater need for consistency generally multiple attributes or properties, which are exposed to the storage system 2
3
© 2013 A. Haeberlen, Z. Ives Big Objects: Amazon S3 S3 = Simple Storage System Stores large objects (=values) that may have access permissions objects – named items stored in S3 buckets of objects – think of these as volumes in a filesystem the console includes a notion of folders, but these are not intrinsic to S3 Accessed via REST/SOAP But we’ll be using Java(script) libraries to interact with S3 You’ll just call them as normal functions, but they will open and close sockets as necessary http://bitbucket.org/jmurty/jets3t/wiki/Home http://aws.amazon.com/sdkfornodejs/ 3
4
© 2013 A. Haeberlen, Z. Ives S3: Access permissions Permissions are assigned through Access Control Lists (ACLs) Essentially, a list of users/groups permissions Bucket permissions are inherited by objects unless overridden at the object level What can you control? Can be at the level of buckets or individual objects Available rights: Read, write, read ACL, write ACL Possible grantees: Everyone, authenticated users, specific users (by AWS account email address) 4
5
© 2013 A. Haeberlen, Z. Ives S3: Uploading an object Step 1: Hit 'upload' in management console 5University of Pennsylvania
6
© 2013 A. Haeberlen, Z. Ives S3: Uploading an object Step 2: Select files Step 3: Set metadata (or accept default) Step 4: Set permissions (or make public) 6University of Pennsylvania
7
© 2013 A. Haeberlen, Z. Ives S3: Pricing and usage, over a year… 7University of Pennsylvania http://aws.amazon.com/s3/ (9/19/2013) http://aws.amazon.com/s3/ (9/18/2014)
8
© 2013 A. Haeberlen, Z. Ives S3: Bucket operations Create bucket (optionally versioned; see later) Delete bucket List all keys in bucket (may not be 100% up to date) Modify bucket permissions 8 Source: Amazon S3 User’s Guide
9
© 2013 A. Haeberlen, Z. Ives S3: Object operations PUT object in bucket GET object from bucket DELETE object from bucket Modify object permissions The key issue: How do we manage concurrent updates? Will I see objects you delete? the latest version? etc. 9
10
© 2013 A. Haeberlen, Z. Ives S3: Consistency models Consistency model depends on the region US West, EU, Asia Pacific, S. America: read-after-write consistency for PUTs of new objects and eventual consistency for overwrite PUTs and DELETEs S3 buckets in the US Standard Region: eventual consistency Read-after-write consistency: Each read or write operation becomes effective at some point between its start time and its completion time Reads return the value of the last effective write 10 Time Client 1: Client 2: W1: Cat W2: Dog R1 R2
11
© 2013 A. Haeberlen, Z. Ives S3: Versioning S3 handles consistency through versioning rather than locking The idea: every bucket + key maps to a list of versions [bucket+key] [object v1] [object v2] [object v3] … Each time we PUT an object, it gets a new version The last-received PUT overwrites any previous ones! When we GET: An unversioned request likely receives the last version – but this is not guaranteed depending on propagation delays A request for bucket + key + version uniquely maps to a single object! Versioning can be enabled for each bucket Why would you (not) want versioning? 11
12
© 2013 A. Haeberlen, Z. Ives Recap: Amazon S3 A key-value store for large objects Buckets, keys, objects, folders Various ways to access objects, e.g., HTTP and BitTorrent Provides eventual consistency +/- a few details that depend on the region Supports versioning and access control Access control is based on ACLs 12University of Pennsylvania
13
© 2013 A. Haeberlen, Z. Ives DynamoDB: Record-Like Key-Value Storage University of Pennsylvania13
14
© 2013 A. Haeberlen, Z. Ives What is Amazon DynamoDB? A highly scalable, non-relational data store Despite its name, not really a database Stronger consistency guarantees than S3 Highly scalable; built-in replication; automatic indexing No 'real' transactions, just a conditional put/delete No 'real' relations and joins, just a fairly basic select 14University of Pennsylvania S3 DynamoDB RDS SimpleDB
15
© 2013 A. Haeberlen, Z. Ives DynamoDB: Data model Somewhat analogous to a spreadsheet: Domains: Entire 'tables'; like buckets Items: Names with attribute-multivalue sets For example, an item could have more than one street address It is possible to add attributes later No pre-defined schema 15University of Pennsylvania CustomerIDDateFirst name Last name Street addressCityStateZipEmail 1231/2/3BobSmith123 Main StSpringfieldMO65801 1232/3/4BobSmith123 Main StKansas CityMO68041 456JamesJohnson456 Front StSeattleWA98104james@foo.com Items Name (hash key) Attributes (key-multivalue) Range Key
16
© 2013 A. Haeberlen, Z. Ives DynamoDB: Basic operations List Tables, Get Table Description Create, Delete Table GetItem, PutItem, UpdateItem, DeleteItem Can do Conditional Writes based on a value Can assign an Atomic Counter with each write, to test versions Select (like an SQL query) 16
17
© 2013 A. Haeberlen, Z. Ives DynamoDB: PutItem, UpdateItem, and GetItem PutItem/UpdateItem has a very simple model: Specify the Table, a set of key attributes, and a set of other attributes UpdateItem can specify a condition based on the Atomic Counter GetItem Specify the Table, set of key attributes Can choose whether the read should be strongly consistent or not What are the advantages of each choice? Can also assign a Condition, e.g., that a value matches some equality condition 17
18
© 2013 A. Haeberlen, Z. Ives DynamoDB: Select A very simple “query” interface based on SQL syntax SELECT output_list FROM domain_name WHERE expression [sort expression] [limit spec] Example: "select * from books where author like 'Tan%' and price <= 55.90 and year is not null order by title desc limit 50" Can choose whether or not read should be consistent Supports a cursor 18
19
© 2013 A. Haeberlen, Z. Ives Alternatives to SimpleDB There is a similar service to SimpleDB underneath most major “cloud” companies’ infrastructure Google calls theirs BigTable Yahoo’s is called PNUTS See reading list at the end All consist of items with a variable set of attribute-value pairs More flexible than a relational DBMS table But don’t support full-fledged transactions 19
20
© 2013 A. Haeberlen, Z. Ives Alternatives to DynamoDB There is a similar service to DynamoDB underneath most major “cloud” companies’ infrastructure In open source there are platforms like HBase, Cassandra, MongoDB, Accumulo that do similar things Google calls theirs BigTable Yahoo’s is called PNUTS See reading list at the end All consist of items with a variable set of attribute-value pairs More flexible than a relational DBMS table But don’t support full-fledged transactions 20
21
© 2013 A. Haeberlen, Z. Ives Recap: Amazon DynamoDB A scalable, non-relational data store Domains, items, keys, values Stronger consistency than S3 No pre-defined schema 21University of Pennsylvania
22
© 2013 A. Haeberlen, Z. Ives Where could we go beyond this? KVSs present one of the simplest data representations: key + one or more objects/properties Some alternatives: Relational databases represent data as interlinked tables (in essence, a limited form of a graph) Hierarchical storage systems represent data as nested entities JSON / Document stores (e.g., MongoDB) support JSON or HTML More general graph storage might represent entire graph structures with links All are implementable over a KVS But all allow higher level requests (e.g., paths), and might optimize for this Example: I know that the customer always asks for images related to patients’ records, so maybe we should put the two in the same place 22
23
© 2013 A. Haeberlen, Z. Ives Summary: Cloud Key/Value Stores Attempt to provide very high durability, availability in a persistent, geographically distributed storage system Need to choose compromises due to limitations of communications, hardware, software Large, seldom-changing objects – eventual consistency and versioned model in S3 Small, more frequently changing objects – lower-latency response, conditional updates in DynamoDB Both are useful in different situations We’ll be using DynamoDB in our assignments, incl HW1M2 23
24
© 2013 A. Haeberlen, Z. Ives Beyond Storage: Other Cloud Services University of Pennsylvania24
25
© 2013 A. Haeberlen, Z. Ives Beyond Storage, What if… I want to host a Web site? Or a Web service? Or an instance of a DBMS that I closely manage? Amazon (and Azure and Google) give several options, including services they manage (e.g., Amazon RDS) and a bare-bones service you manage “Infrastructure as a Service”, IaaS Amazon Elastic Compute Cloud (EC2), Azure Virtual Machines, Google Compute Engine University of Pennsylvania 25
26
© 2013 A. Haeberlen, Z. Ives Amazon EC2 Logging into AWS Management Console Launching an instance Contacting the instance via ssh Terminating an instance Have a look at the AWS Getting Started guide: http://www.cis.upenn.edu/~nets212/handouts/aws-getting-started.pdf University of Pennsylvania 26
27
© 2013 A. Haeberlen, Z. Ives Oh no - where has my data gone? EC2 instances do not have persistent storage Data survives stops & reboots, but not termination So where should I put persistent data? Elastic Block Store (EBS) - in a few slides Ideally, use an AMI with an EBS root (Amzon's default AMI has this property) University of Pennsylvania 27 If you store data on the virtual hard disk of your instance and the instance fails or you terminate it, your data WILL be lost!
28
© 2013 A. Haeberlen, Z. Ives Amazon Machine Images When I launch an instance, what software will be installed on it? Software is taken from an Amazon Machine Image (AMI) Selected when you launch an instance Essentially a file system that contains the operating system, applications, and potentially other data Lives in S3 How do I get an AMI? Amazon provides several generic ones, e.g., Amazon Linux, Fedora Core, Windows Server,... You can make your own You can even run your own custom kernel (with some restrictions) University of Pennsylvania 28
29
© 2013 A. Haeberlen, Z. Ives Security Groups Basically, a set of firewall rules Can be applied to groups of EC2 instances Each rule specifies a protocol, port numbers, etc... Only traffic matching one of the rules is allowed through Sometimes need to explicitly open ports University of Pennsylvania 29 Instance Evil attacker Legitimate user (you or your customers)
30
© 2013 A. Haeberlen, Z. Ives Regions and Availability Zones Where exactly does my instance run? No easy way to find out - Amazon does not say Instances can be assigned to regions Currently 9 availble: US East (Northern Virginia), US West (Northern California), US West (Oregon), EU (Ireland), Asia/Pacific (Singapore), Asia/Pacific (Sydney), Asia/Pacific (Tokyo), South America (Sao Paulo), AWS GovCloud Important, e.g., for reducing latency to customers Instances can be assigned to availability zones Purpose: Avoid correlated fault Several availability zones within each region University of Pennsylvania 30
31
© 2013 A. Haeberlen, Z. Ives Network pricing AWS does charge for network traffic Price depends on source and destination of traffic Free within EC2 and other AWS svcs in same region (e.g., S3) Remember: ISPs are typically charged for upstream traffic University of Pennsylvania 31 http://aws.amazon.com/ec2/pricing (9/18/2014)
32
© 2013 A. Haeberlen, Z. Ives Instance types So far: On-demand instances Also available: Reserved instances One-time reservation fee to purchase for 1 or 3 years Usage still billed by the hour, but at a considerable discount Also available: Spot instances Spot market: Can bid for available capacity Instance continues until terminated or price rises above bid University of Pennsylvania 32 Source: http://aws.amazon.com/ ec2/reserved-instances/
33
© 2013 A. Haeberlen, Z. Ives Service Level Agreement University of Pennsylvania 33 http://aws.amazon.com/ec2-sla/ (9/11/2013; excerpt) 4.38h downtime per year allowed
34
© 2013 A. Haeberlen, Z. Ives Recap: EC2 What EC2 is: IaaS service - you can rent virtual machines Various types: Very small to very powerful How to use EC2: Ephemeral state - local data is lost when instance terminates AMIs - used to initialize an instance (OS, applications,...) Security groups - "firewalls" for your instances Regions and availability zones On-demand/reserved/spot instances Service level agreement (SLA) University of Pennsylvania 34
35
© 2013 A. Haeberlen, Z. Ives Virtual Disks for EC2 University of Pennsylvania35
36
© 2013 A. Haeberlen, Z. Ives Elastic Block Store (EBS) Persistent storage Unlike the local instance store, data stored in EBS is not lost when an instance fails or is terminated Should I use the instance store or EBS? Typically, instance store is used for temporary data University of Pennsylvania 36 Instance EBS storage
37
© 2013 A. Haeberlen, Z. Ives Volumes EBS storage is allocated in volumes A volume is a 'virtual disk' (size: 1GB - 1TB) Basically, a raw block device Can be attached to an instance (but only one at a time) A single instance can access multiple volumes Placed in specific availability zones Why is this useful? Be sure to place it near instances (otherwise can't attach) Replicated across multiple servers Data is not lost if a single server fails Amazon: Annual failure rate is 0.1-0.5% for a 20GB volume University of Pennsylvania 37
38
© 2013 A. Haeberlen, Z. Ives EC2 instances with EBS roots EC2 instances can have an EBS volume as their root device ("EBS boot") Result: Instance data persists independently from the lifetime of the instance You can stop and restart the instance, similar to suspending and resuming a laptop You won't be charged for the instance while it is stopped (only for EBS) You can enable termination protection for the instance Blocks attempts to terminate the instance (e.g., by accident) until termination protection is disabled again Alternative: Use instance store as the root You can still store temporary data on it, but it will disappear when you terminate the instance You can still create and mount EBS volumes explicitly University of Pennsylvania 38
39
© 2013 A. Haeberlen, Z. Ives Time Snapshots You can create a snapshot of a volume Copy of data in the volume at the time snapshot was made Only the first snapshot makes a full copy; subsequent snapshots are incremental What are snapshots good for? Sharing data with others DBpedia snapshot ID is "snap-882a8ae3" Access control list (specific account numbers) or public access Instantiate new volumes Point-in-time backups University of Pennsylvania 39
40
© 2013 A. Haeberlen, Z. Ives Pricing You pay for... Storage space: $0.10 per allocated GB per month I/O requests: $0.10 per million I/O requests S3 operations (GET/PUT) Charge is only for actual storage used Empty space does not count University of Pennsylvania 40
41
© 2013 A. Haeberlen, Z. Ives Creating an EBS volume University of Pennsylvania 41 Needs to be in same availability zone as your instance! DBpedia snapshot ID Create volume
42
© 2013 A. Haeberlen, Z. Ives Mounting an EBS volume Step 1: Attach the volume Step 2: Mount the volume in the instance University of Pennsylvania 42 mkse212@vm:~$ ec2-attach-volume -d /dev/sda2 -i i-9bd6eef1 vol-cca68ea5 ATTACHMENT vol-cca68ea5 i-9bd6eef1 /dev/sda2 attaching mkse212@vm:~$ mkse212@vm:~$ ssh ec2-user@ec2-50-17-64-130.compute-1.amazonaws.com __| __|_ ) Amazon Linux AMI _| ( / Beta ___|\___|___| See /usr/share/doc/system-release-2011.02 for latest release notes. :-) [ec2-user@ip-10-196-82-65 ~]$ sudo mount /dev/sda2 /mnt/ [ec2-user@ip-10-196-82-65 ~]$ ls /mnt/ dbpedia_3.5.1.owl dbpedia_3.5.1.owl.bz2 en other_languages [ec2-user@ip-10-196-82-65 ~]$
43
© 2013 A. Haeberlen, Z. Ives Detaching an EBS volume Step 1: Unmount the volume in the instance Step 2: Detach the volume University of Pennsylvania 43 mkse212@vm:~$ ec2-detach-volume vol-cca68ea5 ATTACHMENT vol-cca68ea5 i-9bd6eef1 /dev/sda2 detaching mkse212@vm:~$ [ec2-user@ip-10-196-82-65 ~]$ sudo umount /mnt/ [ec2-user@ip-10-196-82-65 ~]$ exit mkse212@vm:~$
44
© 2013 A. Haeberlen, Z. Ives Recap: Elastic Block Store (EBS) What EBS is: Basically a virtual hard disk; can be attached to EC2 instances Persistent - state survives termination of EC2 instance How to use EBS: Allocate volume - empty or initialized with a snapshot Attach it to EC2 instance and mount it there Can create snapshots for data sharing, backup University of Pennsylvania 44
45
© 2013 A. Haeberlen, Z. Ives Cloud Case Studies University of Pennsylvania45
46
© 2013 A. Haeberlen, Z. Ives Recall Some Cloud Definitions As discussed previously, “cloud” is a broad term but comprises: Very large data centers with thousands of commodity machines Multiple, geographically distributed sites Common management infrastructure Common programming infrastructure that automatically allocates requests and/or jobs toavailable machines Difference between public and private clouds? Public clouds sub-contract out to multiple clients; private clouds are controlled by one organization 46 University of Pennsylvania
47
© 2013 A. Haeberlen, Z. Ives Recap: Types of clouds Software as a Service (SaaS): cloud-hosted apps think Hotmail, GMail, Google Docs, Office Web, … where Microsoft, etc. want to go – subscriptions & ads Platform as a Service (PaaS): programming layer and services over the cloud think Hadoop, MS Azure, extensible apps, Google Maps Infrastructure as a Service (IaaS): virtual machines, virtualized networks and disks think Amazon EC2 typically includes Storage as a Service: EBS, etc. also some variants like content delivery networks University of Pennsylvania47
48
© 2013 A. Haeberlen, Z. Ives The major public Cloud providers Amazon is the big player Multiple services: infrastructure as a service, platform as a service (incl. Hadoop), storage as a service But there are many others: Microsoft Azure – has a similar stack to Amazon Google App + Compute EngineEngine – again, similar Also software as a service: GMail, Docs, etc. IBM, HP, Yahoo – seem to focus mostly on enterprise (often private) cloud apps (not small business- level) Rackspace, Terremark/Verizon – mostly infrastructure as a service University of Pennsylvania48
49
© 2013 A. Haeberlen, Z. Ives Case Studies We’ll look at successful examples of: SaaS: Salesforce.com PaaS: Facebook IaaS: Netflix University of Pennsylvania49
50
© 2013 A. Haeberlen, Z. Ives A SaaS Example: Salesforce.com University of Pennsylvania50
51
© 2013 A. Haeberlen, Z. Ives Perhaps the first truly successful “software as a service” platform Predated the term “cloud” (founded in 1999) – and was initially met with skepticism Now the IBMs, MSs of the world want to be like them: a constant revenue stream, unlike shrink-wrapped software What is the software being provided? “Customer Relationship Management” – tools for sales people to find customers, keep in contact with them Gives a bird’s-eye view of customers’ status, in-flight orders, order history, leads, approvals, etc. Salesforce.com University of Pennsylvania51
52
© 2013 A. Haeberlen, Z. Ives Salesforce.com: A Timeline Founded in 1999: first proponents of the term ‘cloud’, with support from Larry Ellison (Oracle) First CRM offered as a SAAS (Software as a service) 2005: offered Force.com as a platform for apps 2010: Chatter Launched, Heroku acquired 2011: Radian 6 acquired, more than 90,000 customers 52University of Pennsylvania © 2012 A. Subramanian
53
© 2013 A. Haeberlen, Z. Ives What does it look like? 53University of Pennsylvania © 2012 A. Subramanian
54
© 2013 A. Haeberlen, Z. Ives Example Salesforce “Dashboard” University of Pennsylvania54
55
© 2013 A. Haeberlen, Z. Ives How Salesforce.com works Basic architecture as of Mar 2009: 'Only' about 1000 mirrored machines for 55K enterprise customers, 1.5M subscribers 10 Oracle databases across 50 servers About 20 predefined tables / schemas, shared across all customers, 100s of TB Sophisticated, proprietary query optimization and indexing AJAX Web interface with various communication services Tracking for Twitter, collaborative tools, etc. Easy “tunnels” for sharing across customers Plug-ins for extensions via Platform-as-a-Service “force.com” – 30M lines of 3 rd party code University of Pennsylvania55
56
© 2013 A. Haeberlen, Z. Ives Salesforce.com Architecture Multi-tenant: Each datacenter contains servers shared across customers Performance maintained by limits App logic separation Scales vertically (adding more cores, improving index strategies) 56University of Pennsylvania © 2012 A. Subramanian
57
© 2013 A. Haeberlen, Z. Ives Salesforce.com Technology Stack Consist of Oracle RAC (Real Application Clusters) nodes Allow transparent access of single database instance by multiple clients Largest standing Oracle installation in the world 57University of Pennsylvania © 2012 A. Subramanian
58
© 2013 A. Haeberlen, Z. Ives Why Salesforce is so effective Their value proposition: outsource your main corporate IT to them They bill per month – force.com $15/user/month They can offer it cheaper than corporate IT: Leverage the same infrastructure, design, and support across many companies at the same time – “multi-tenancy” Some customers: Dell, AMD, SunTrust, Spring, Computer Associates, Kaiser Permanente University of Pennsylvania58
59
© 2013 A. Haeberlen, Z. Ives PaaS Case Study: Facebook University of Pennsylvania59
60
© 2013 A. Haeberlen, Z. Ives Users of Platform as a Service Facebook provides some PaaS capabilities to application developers Web services – remote APIs – that allow access to social network properties, data, “Like” button, etc. Many third-parties run their apps off Amazon EC2, and interface to Facebook via its APIs – PaaS + IaaS Facebook itself makes heavy use of PaaS services for their own private cloud Key problems: how to analyze logs, make suggestions, determine which ads to place See also Chapter 16 of the Tom White book University of Pennsylvania60
61
© 2013 A. Haeberlen, Z. Ives Facebook API: Overview What you can do: Read data from profiles and pages Navigate the graph (e.g., via friends lists) Issue queries (for posts, people, pages,...) Add or modify data (e.g., create new posts) Get real-time updates, issue batch requests,... How you can access it: Graph API FQL Legacy REST API 61University of Pennsylvania
62
© 2013 A. Haeberlen, Z. Ives Facebook API: The Graph API (1/2) Requests are mapped directly to HTTP: https://graph.facebook.com/(identifier)?fields=(fieldList) Response is in JSON 62University of Pennsylvania { "id": "1074724712", "age_range": { "min": 21 }, "locale": "en_US", "location": { "id": "101881036520836", "name": "Philadelphia, Pennsylvania" } }
63
© 2013 A. Haeberlen, Z. Ives Facebook API: The Graph API (2/2) Uses several HTTP methods: GET for reading POST for adding or modifying DELETE for removing IDs can be numeric or names /1074724712 or /andreas.haeberlen Pages also have IDs Authorization is via 'access tokens' Opaque string; encodes specific permissions (access user location, but not interests, etc.) Has an expiration date, so may need to be refreshed 63University of Pennsylvania
64
© 2013 A. Haeberlen, Z. Ives Facebook Data Management / Warehousing Tasks Main tasks for “cloud” infrastructure: Summarization (daily, hourly) to help guide development on different components to report on ad performance recommendations Ad hoc analysis: Answer questions on historical data – to help with managerial decisions Archival of logs Spam detection Ad optimization... Initially used Oracle DBMS for this But eventually hit scalability, cost, performance bottlenecks... just like Salesforce does now University of Pennsylvania64
65
© 2013 A. Haeberlen, Z. Ives Data Warehousing at Facebook University of Pennsylvania65 http://sites.ieee.org/scv-cs/files/2011/03/Facebook-Hive-by-Ashish-Thusoo.pdf >2PB of data 10TB added every day Mostly HDFS (+ some mySQL) 2,400 cores 9TB of memory
66
© 2013 A. Haeberlen, Z. Ives PaaS at Facebook (partial list of components; these have evolved) Scribe – open source logging, actually records the data that will be analyzed by Hadoop Hadoop (MapReduce – discussed next time) as batch processing engine for data analysis As of 2009: 2 nd largest Hadoop cluster in the world, 2400 cores, > 2PB data with > 10TB added every day Hive – SQL over Hadoop, used to write the data analysis queries Federated MySQL, Oracle – multi-machine DBMSs to store query results University of Pennsylvania66
67
© 2013 A. Haeberlen, Z. Ives Example Use Case 1: Ad Details Advertisers need to see how their ads are performing Cost-per-click (CPC), cost-per-1000-impressions (CPM) Social ads – include info from friends Engagement ads – interactive with video Performance numbers given: Number unique users, clicks, video views, … Main axes: Account, campaign, ad Time period Type of interaction Users Summaries are computed using Hadoop via Hive University of Pennsylvania67
68
© 2013 A. Haeberlen, Z. Ives Use Case 2: Ad Hoc analysis, feedback Engineers, product managers may need to understand what is going on e.g., impact of a new change on some sub-population Again, Hive-based, i.e., queries are in SQL with database joins Combine data from several tables, e.g., click-through rate = views combined with clicks Sometimes requires custom analysis code with sampling University of Pennsylvania68
69
© 2013 A. Haeberlen, Z. Ives IaaS Case Study: Netflix University of Pennsylvania69
70
© 2013 A. Haeberlen, Z. Ives IaaS example: Netflix Perhaps Amazon’s highest-profile customer In 12/2010, most of their traffic was served from AWS A year earlier, none of it was Why did Netflix take this step? Needed to re-architect after a phase of growth Ability to question everything Focus on their core competence (content); leave the 'heavy lifting' (datacenter operation) to Amazon Customer growth & device engagement hard to predict With the cloud, they don't have to Belief that cloud computing is the future Gain experience with an increasingly important technology University of Pennsylvania70
71
© 2013 A. Haeberlen, Z. Ives How Netflix uses AWS Streaming movie retrieval and playback Media files stored in S3 “Transcoding” to target devices (Wii, iPad, etc.) using EC2 Web site modules Movie lists and search – app hosted by Amazon Web Services Recommendations Analysis of streaming sessions, business metrics – using Elastic MapReduce University of Pennsylvania71
72
© 2013 A. Haeberlen, Z. Ives Netflix: 5 Lessons learned using AWS Dorothy, you're not in Kansas anymore Be prepared to unlearn a lot of what you know Example: Assumptions about network capacity, hw reliability Co-tenancy is hard Throughput variance can occur at any level in the stack Best way to avoid failure: Fail constantly Design for failure independence; use the 'Chaos Monkey' Learn with real scale, not toy models Only full-scale traffic shows where the real bottlenecks are Commit yourself 72University of Pennsylvania
73
© 2013 A. Haeberlen, Z. Ives Discussion University of Pennsylvania73
74
© 2013 A. Haeberlen, Z. Ives Other users, and the future Startups, especially, are making great use of EC2, Rackspace, etc. for their hosting needs compare to 10 years ago – dot-com boom – where you started by buying a cluster of SPARC machines Government, health care, science, many enterprises have great interest in cost savings of the cloud But concerns remain – esp. with respect to security, privacy, availability … And moreover: the last word has not been written on how to program the cloud University of Pennsylvania74
75
© 2013 A. Haeberlen, Z. Ives Given this discussion… Our goal for the remainder of the semester: learn how to build applications much like the ones we discussed We’ll use many of the same programming platforms, tools, etc. And there will be an AJAX, Web-based emphasis on the projects University of Pennsylvania75
76
© 2013 A. Haeberlen, Z. Ives Next time The first “programming model for the cloud”: MapReduce Not really a language – but a set of interfaces and a runtime system Please read Dean & Ghemawat paper – the Google work that spawned it all Later in the semester we’ll see more sophisticated models, including some research ones University of Pennsylvania76
77
© 2013 A. Haeberlen, Z. Ives Stay tuned Next time you will learn about: A programming model for the Cloud 77University of Pennsylvania
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.