Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPT-S 580-06 Advanced Databases 1 Yinghui Wu EME 49 ADB (ln26)

Similar presentations


Presentation on theme: "CPT-S 580-06 Advanced Databases 1 Yinghui Wu EME 49 ADB (ln26)"— Presentation transcript:

1 CPT-S 580-06 Advanced Databases 1 Yinghui Wu EME 49 ADB (ln26)

2 DBMS and Cloud Computing Cloud computing: overview Database design in cloud CPT-S 580-08 Advanced Databases

3 Cloud computing: concept 3

4 The Hype! Forrester in 2010 – Cloud computing will go from $40.7 billion in 2010 to $241 billion in 2020. Gartner in 2009 - Cloud computing revenue will soar faster than expected and will exceed $150 billion by 2013. It will represent 19% of IT spending by 2015. IDC in 2009: “Spending on IT cloud services will triple in the next 5 years, reaching $42 billion.” Companies and even Federal/state governments using cloud computing now: fedbizopps.gov

5 What is Cloud Computing? Cloud Computing is a general term used to describe a new class of network based computing that takes place over the Internet, –basically a step on from Utility Computing –a collection/group of integrated and networked hardware, software and Internet infrastructure (called a platform). –Using the Internet for communication and transport provides hardware, software and networking services to clients These platforms hide the complexity and details of the underlying infrastructure from users and applications by providing very simple graphical interface or API (Applications Programming Interface). 5

6 What is Cloud Computing? In addition, the platform provides on demand services, that are always on, anywhere, anytime and any place. Pay for use and as needed, elastic –scale up and down in capacity and functionalities The hardware and software services are available to –general public, enterprises, corporations and businesses markets A number of characteristics: –Remotely hosted: Services or data are hosted on remote infrastructure. –Ubiquitous: Services or data are available from anywhere. –Commodified: The result is a utility computing model similar to traditional that of traditional utilities, like gas and electricity - you pay for what you would want! 6

7 Cloud Architecture 7

8 Cloud Computing Characteristics 8 Common Characteristics: Low Cost Software Virtualization Service Orientation Advanced Security Homogeneity Massive Scale Resilient Computing Geographic Distribution Essential Characteristics: Resource Pooling Broad Network Access Rapid Elasticity Measured Service On Demand Self-Service

9 What is a Cloud? A single-site cloud (aka “Datacenter”) consists of –Compute nodes (grouped into racks) –Switches, connecting the racks –A network topology, e.g., hierarchical –Storage (backend) nodes connected to the network –Front-end for submitting jobs and receiving client requests –(Often called 3-tier architecture) –Software Services A geographically distributed cloud consists of –Multiple such sites –Each site perhaps with a different structure and services

10 On-demand access: *aaS Classification 10 Software as a Service (SaaS) Platform as a Service (PaaS) Infrastructure as a Service (IaaS) Google App Engine SalesForce CRM LotusLive

11 On-demand access: *aaS Classification On-demand: renting a cab vs. (previously) renting a car, or buying one. E.g.: –AWS Elastic Compute Cloud (EC2): a few cents to a few $ per CPU hour –AWS Simple Storage Service (S3): a few cents to a few $ per GB- month HaaS: Hardware as a Service –You get access to barebones hardware machines, do whatever you want with them, Ex: Your own cluster –Not always a good idea because of security risks IaaS: Infrastructure as a Service –You get access to flexible computing and storage infrastructure. Virtualization is one way of achieving this (what’s another way, e.g., using Linux). Often said to subsume HaaS. –Ex: Amazon Web Services (AWS: EC2 and S3), Eucalyptus, Rightscale, Microsoft Azure, Google Compute Engine.

12 On-demand access: *aaS Classification PaaS: Platform as a Service –You get access to flexible computing and storage infrastructure, coupled with a software platform (often tightly coupled) –Ex: Google’s AppEngine (Python, Java, Go) SaaS: Software as a Service –You get access to software services, when you need them. Often said to subsume SOA (Service Oriented Architectures). –Ex: Google docs, MS Office on demand

13 Cloud computing: pros, cons and thoughts 13

14 Opportunities and Challenges The use of the cloud provides a number of opportunities: –It enables services to be used without any understanding of their infrastructure. –Cloud computing works using economies of scale: It potentially lowers the outlay expense for start up companies, as they would no longer need to buy their own software or servers. Cost would be by on-demand pricing. Vendors and Service providers claim costs by establishing an ongoing revenue stream. –Data and services are stored remotely but accessible from “anywhere”. 14

15 Opportunities and Challenges In parallel there has been backlash against cloud computing: –Use of cloud computing means dependence on others and that could possibly limit flexibility and innovation: The others are likely become the bigger Internet companies like Google and IBM, who may monopolise the market. Some argue that this use of supercomputers is a return to the time of mainframe computing that the PC was a reaction against. –Security could prove to be a big issue: It is still unclear how safe out-sourced data is and when using these services ownership of data is not always clear. –There are also issues relating to policy and access: If your data is stored abroad whose policy do you adhere to? What happens if the remote server goes down? How will you then access files? There have been cases of users being locked out of accounts and losing access to data. 15

16 The Future Many of the activities loosely grouped together under cloud computing have already been happening and centralised computing activity is not a new phenomena However there are concerns that the mainstream adoption of cloud computing could cause many problems for users Many new open source systems appearing that you can install and run on your local cluster –should be able to run a variety of applications on these systems 16

17 Design of scalable DBMS over cloud: transactions 17 Divy Agrawal, Sudipto Das, and Amr El Abbadi (VLDB 10)

18 Design Principle (I) Separate System and Application State –System metadata is critical but small –Application data has varying needs –Separation allows use of different class of protocols

19 Design Principle (II) Limit interactions to a single node –Allows systems to scale horizontally –Graceful degradation during failures –Obviate need for distributed synchronization

20 Design Principle (III) Decouple Ownership from Data Storage –Ownership refers to exclusive read/write access to data –Partition ownership – effectively partitions data –Decoupling allows light weight ownership transfer

21 Design Principle (IV) Limited distributed synchronization is practical –Maintenance of metadata –Provide strong guarantees only for data that needs it

22 Two Approaches to Scalability Data Fusion –Enrich Key Value stores –GStore: Efficient Transactional Multi-key access [ACM SOCC’2010] Data Fission –Cloud enabled relational databases –ElasTraS: Elastic TranSactional Database [HotClouds2009;Tech. Report’2010]

23 Data fusion 23 Divy Agrawal, Sudipto Das, and Amr El Abbadi (VLDB 10)

24 Atomic Multi-key Access [Das et al., ACM SoCC 2010] Key value stores: – Atomicity guarantees on single keys –Suitable for majority of current web applications Many other applications need multi-key accesses: –Online multi-player games –Collaborative applications Enrich functionality of the Key value stores

25 Key Group Abstraction Define a granule of on-demand transactional access Applications select any set of keys to form a group Data store provides transactional access to the group Non-overlapping groups

26 Horizontal Partitions of the Keys A single node gains ownership of all keys in a KeyGroup Keys located on different nodes Key Group Group Formation Phase

27 Key Grouping Protocol Conceptually akin to “locking” Allows collocation of ownership at the leader Leader is the gateway for group accesses “Safe” ownership transfer: deal with dynamics of the underlying Key Value store –Data dynamics of the Key-Value store –Various failure scenarios Hides complexity from the applications while exposing a richer functionality

28 Implementing GStore Grouping Layer Key-Value Store Logic Distributed Storage Application Clients Transactional Multi-Key Access Transaction Manager Grouping Layer Key-Value Store Logic Transaction Manager Grouping Layer Key-Value Store Logic Transaction Manager Grouping Middleware Layer resident on top of a Key-Value Store

29 Data fission 29

30 Elastic Transaction Management [Das et al., HotCloud 2009, UCSB TR 2010] Designed to make RDBMS cloud-friendly Database viewed as a collection of partitions Suitable for standard OLTP workloads: –Large single tenant database instance Database partitioned at the schema level –Multi-tenant with large number of small databases Each partition is a self contained database

31 Elastic Transaction Management Elastic to deal with workload changes Dynamic Load balancing of partitions Automatic recover from node failures Transactional access to database partitions

32 OTM Distributed Fault-tolerant Storage OTM TM Master Metadata Manager Application Clients Application Logic ElasTraS Client P1P1 P1P1 P2P2 P2P2 PnPn PnPn Txn Manager DB Partitions Master Proxy MM Proxy Log Manager Durable Writes Health and Load Management Lease Management DB Read/Write Workload

33 What to optimize? FeatureTraditionalCloud Cost [$]fixedoptimize Performance [tps, secs]optimizefixed Scale-out [#cores]optimizefixed Predictability [  ($)] -fixed Consistency [%]fixed??? Flexibility [#variants]-optimize [Florescu & Kossmann, SIGMOD Record 2009]

34 Open Questions How to implement the storage layer? What is the right consistency model? What is the right programming model? Whether and how to make use of caching? How to balance functionality and scale? What are the right cloud abstractions? Cloud inter-operatability Moving beyond a single cloud [Adapted from D. Kossmann‘s ICDE 2010 Keynote]

35 References [Cooper et al., ACM SoCC 2010] Benchmarking Cloud Serving Systems with YCSB, B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, R. Sears, In ACM SoCC 2010 [Brantner et al., SIGMOD 2008] Building a Database on S3 by M. Brartner, D. Florescu, D. Graf, D. Kossman, T. Kraska, SIGMOD’08 [Kraska et al., VLDB 2009] Consistency Rationing in the Cloud: Pay only when it matters, T. Kraska, M. Hentschel, G. Alonso, and D. Kossmann, VLDB 2009 [Lomet et al., CIDR 2009] Unbundling Transaction Services in the Cloud, D. Lomet, A. Fekete, G. Weikum, M. Zwilling, CIDR’09 [Das et al., HotCloud 2009] ElasTraS: An Elastic Transactional Data Store in the Cloud, S. Das, D. Agrawal, and A. El Abbadi, USENIX HotCloud, 2009 [Das et al., ACM SoCC 2010] G-Store: A Scalable Data Store for Transactional Multi key Access in the Cloud, S. Das, D. Agrawal, and A. El Abbadi, ACM SOCC, 2010. [Das et al., TR 2010] ElasTraS: An Elastic, Scalable, and Self Managing Transactional Database for the Cloud, S. Das, S. Agarwal, D. Agrawal, and A. El Abbadi, UCSB Tech Report CS 2010-04

36 References [Yang et al., CIDR 2009] A scalable data platform for a large number of small applications, F. Yang, J. Shanmugasundaram, and R. Yerneni, CIDR, 2009 [Kossmann et al., SIGMOD 2010] An Evaluation of Alternative Architectures for Transaction Processing in the Cloud, D Kossmann, T. Kraska, Simon Loesing, In SIGMOD 2010 [Aulbach et al., SIGMOD 2009] A Comparison of Flexible Schemas for Software as a Service, S. Aulbach, D. Jacobs, A. Kemper, M. Seibold, In SIGMOD 2009 [Aulbach et al., SIGMOD 2008] Multi-Tenant Databases for Software as a Service: Schema and Mapping Technicques, In SIGMOD 2008 [Weissman et al., SIGMOD 2009] The Design of the Force.com Multitenant Internet Application Development Platform, C.D. Weissman, S. Bobrowski, In SIGMOD 2009 [Jacobs et al., DTW 2007] Ruminations of Multi-Tenant Databases, D. Jacobs, S. Aulbach, In DTW 2007 [Chang et al., OSDI 2006] Bigtable: A Distributed Storage System for Structured Data, F. Chang et al., In OSDI 2006 [Cooper et al., VLDB 2008] PNUTS: Yahoo!'s hosted data serving platform, B. F. Cooper et al., In VLDB 2008 [DeCandia et al., SOSP 2007] Dynamo: amazon's highly available key-value store, G. DeCandia et al., In SOSP 2007


Download ppt "CPT-S 580-06 Advanced Databases 1 Yinghui Wu EME 49 ADB (ln26)"

Similar presentations


Ads by Google