Multi-Tenant Databases for SaaS (Software as a Service)

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

Terms. 1. Globalization 2. Financing 3. Inputs.
Advanced SQL Topics Edward Wu.
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
Summary of Convergence Tests for Series and Solved Problems
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Conversion Problems 3.3.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Database Systems: Design, Implementation, and Management
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Week 2 The Object-Oriented Approach to Requirements
Information Systems Today: Managing in the Digital World
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
Database Performance Tuning and Query Optimization
Columbus State Community College
EU market situation for eggs and poultry Management Committee 20 October 2011.
11 Copyright © Oracle Corporation, All rights reserved. Managing Tables.
Yong Choi School of Business CSU, Bakersfield
Vanderbilt Business Objects Users Group 1 Reporting Techniques & Formatting Beginning & Advanced.
An Application of Linear Programming Lesson 12 The Transportation Model.
Chapter 10: Virtual Memory
2 |SharePoint Saturday New York City
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
VOORBLAD.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
© 2012 National Heart Foundation of Australia. Slide 2.
Lecture plan Outline of DB design process Entity-relationship model
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Executional Architecture
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Functional Dependencies and Normalization for Relational Databases
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Januar MDMDFSSMDMDFSSS
Statistical Inferences Based on Two Samples
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Essential Cell Biology
Chapter 13 The Data Warehouse
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
Energy Generation in Mitochondria and Chlorplasts
Chapter 9: Using Classes and Objects. Understanding Class Concepts Types of classes – Classes that are only application programs with a Main() method.
MULTI-TENANT DATABASES FOR SOFTWARE AS A SERVICE Stefan Aulbach*, Torsten Grust*, Dean Jacobs #, Alfons Kemper * and Jan Rittinger* Presented by : Ranjan.
Lecture IX: Multi-tenant Architecture CS 4593 Cloud-Oriented Big Data and Software Engineering.
Presentation transcript:

Multi-Tenant Databases for SaaS (Software as a Service) 11/17/08 (Some of the Slides are from Alfons Kemper)

Outline Motivations Multi-tenancy database (MTD) MTD database schema layouts Existing: basic, private, extension, universal, pivot New: chunk table & chunk folding Querying chunk tables Summary

SaaS & Multi-Tenancy SaaS (Software-as-a-Service) Software managed by SaaS vendor, run on SaaS server Tenancy = client organization ~10 users, small/medium business Multi-instance separate instance for each tenant Multi-tenancy single instance of software serving multiple tenants

Why Multi-Tenancy Economy of scale Examples? large number of tenants sharing the cost of single software instance cost saving for each individual user Examples?

Multi-Tenancy Example

Multi-Tenancy in Practice Big iron # tenants per database 10000 1000 100 Size of Machine 10000 1000 100 10 10000 1000 100 10 1 Blade Email Proj Mgmt CRM ERP Banking Small Large Complexity of Application Economy of scale decreases with application complexity At the sweet spot, compare TCO of 1 versus 100 databases

A Typical Blade Server IBM HS20 (Wikipedia) IBM BladeCenter HS12: CPU (single, dual, quad-core xeon 2~3 GHz) Memory (max): 24 GB Internal storage (max): 293 GB 24G / (10k tenant) = 2.4M/tenant

Multi-Tenancy in Practice Big iron # tenants per database 10000 1000 100 Size of Machine 10000 1000 100 10 10000 1000 100 10 1 Blade Email Proj Mgmt CRM ERP Banking Small Large Complexity of Application Economy of scale decreases with application complexity At the sweet spot, compare TCO of 1 versus 100 databases

Alternative: Multi-Instance + Virtualization

Multi-Tenant Databases (MTD) Consolidating multiple businesses onto same operational system Consolidation factor dependent on size of the application and the host machine Support for schema extensibility Essential for ERP applications Support atop of the database layer Non-intrusive implementation Query transformation engine maps logical tenant-specific tables to physical tables Various problems, for example: Various table utilization (“hot spots”) Metadata management when handling lots of tables

Outline Motivations Multi-tenancy database (MTD) MTD database schema layouts Existing: basic, private, extension, universal, pivot New: chunk table & chunk folding Querying chunk tables Summary

Classic Web Application (Basic Layout) Pack multiple tenants into the same tables by adding a tenant id column Great consolidation but no extensibility Account TenId AcctId Name ... 17 1 Acme 17 2 Gump 35 1 Ball 42 1 Big

Private Table Each tenant gets his/her own private schema No sharing SQL transformation: Renaming only High meta-data/data ratio Lots of tables: linear in # of tenants, each with own meta-data Low buffer utilization (~8k for index/data for each table) Account TenId AcctId Name ... 17 1 Acme 17 2 Gump 35 1 Ball 42 1 Big

CRM Schema

Handling Lots of Tables Simplifying assumption: No extensibility Experiment setup: CRM schema with 10 tables 10,000 tenants are packed onto one DBMS (DB2, 1G Memory) Data set size remains constant Parameter: Schema Variability Number of tenants per schema instance 0 (least variable): all tenants share one instance (= 10 tables) 1 (most variable): each tenant has separate instance (= 100k tables)

Handling Lots of Tables – Results 10 fully shared Tables 100.000 private Tables

Extension Table Split off the extensions into separate tables Additional join at runtime Row column for reconstructing the row (discussion: consider Acct17) Account17(A, N, H, B) = select Ae.A, Ae.N, Ha.H, Ha.B from Account_ext as Ae, Healthcare_account as Ha where Ae.Tenant = Ha.Tenant & Ae.Row = Ha.Row and Ae.Tenant = 17

Extension Table (Cont’d) Good: Better consolidation than Private Table layout common attributes go to same table Bad: Number of tables still grows in proportion to number of tenants tenants in same domain may often have varied schema so large # of *extension* tables (such as Healthcare_account)

Universal Table Generic structure with VARCHAR value columns n-th column of a logical table is mapped to ColN in an universal table Extensibility: # of columns may expand as needed Disadvantages Very wide rows  Many NULL values Not type-safe  Casting necessary No index support (note column index not very meaningful) logical table (private after renaming)

Pivot Table Each field of a row in logical table is given its own row. Multiple pivot tables for each type (int, string, e.g.) Int String

Reconstruct Tenant Table from Pivot Table Row: 0 Account17(H, B) = select Ps.Str, Pi.int from Pivot_str as Ps, Pivot_int as Pi where Ps.Tenant = Pi.Tenant & Ps.Table = Pi.Table & Ps.Row = Pi.Row & Ps.Col = 2 & Ps.Col = 3 & Ps.Tenant = 17 & Ps.Table = 0 Int align (Only Hospital & Beds) String

Reconstruct Tenant Table (cont’d) align: same tenant, table, row Row: 0 Account17(A, N, H, B) = select Pi’.int, Ps’.str, Ps.Str, Pi.int from Pivot_int as Pi’, Pivot_str as Ps’, Pivot_str as Ps, Pivot_int as Pi where Pi’.Tenant = Ps’.Tenant & Ps’.Tenant = Ps.Tenant & Ps.Tenant = Pi.Tenant & Pi’.Table = Ps’.Table & Ps’.Table = Ps.Table & Ps.Table = Pi.Table & Pi’.Row = Ps’.Row & Ps’.Row = Ps.Row & Ps.Row = Pi.Row & Pi’.Col = 0 & Ps’.Col = 1 & Ps.Col = 2 & Ps.Col = 3 & Ps.Tenant = 17 & Ps.Table = 0 4-way join Int String

Pivot Table: Performance Row: 0 Generic type-safe structure Eliminates handling many NULL values Performance Depends on the column selectivity of the query (number of reconstructing joins) E.g., query on A17(H,B) is more selective than A17(A,N,H,B) See previous example Int String

Pivot Table  Chunk Table Row: 0 Row: 0 Chunk 0 Chunk 1 Field 0 Field 1 Field 2 Field 3 one row for each field one row for each chunk Chunk table Pivot table

How to Chunk or Fragment Original Rows Many possible fragmentations One idea: group fields by their “popularity”

Chunk Table 3. reconstruct original row Account17(A, N, H, B) = select Cis.Int1, Cis.Str1, Cis’.Str1, Cis’.Int1 from Chunk Cis, Chunk Cis’ where Cis.Chunk = 0 & Cis’.Chunk =1 & Cis.Row = Cis’.Row & Cis.Tenant = Cis’.Tenant & Cis.Table = Cis’.Table Row: 0 Chunk 0 Chunk 1 2-way join 2. merge chunk 1. align rows

Chunk Table vs. Universal Table # of rows (for each original row) = # of chunks Universal as extreme chunking: only one chunk per row Chunk 0 only one row for each original row

Chunk Table Performance Fewer joins for reconstruction Indexable Reduced meta-data/data ratio dependant on chunk size Row: 0 Chunk 0 Chunk 1

Chunk Folding: Alternative Chunking Mixes Extension and Chunk Tables Optimal row fragmentation depends on, e.g. Workload Data distribution Data popularity base table chunk the extensions

Querying Chunk Tables Query Transformation Compilation Scheme: Row reconstruction needs many self- and equi-joins Can be automatically translated Compilation Scheme: Collect all table names and their corresponding columns from the logical source query Obtain table definitions: for each table obtain the Chunk Tables and the meta-data identifiers representing the used columns generate a query that filters the correct columns (based on the meta-data identifiers) and aligns the different chunk relations on their ROW column. Extend each table reference in the logical source query by corresponding table definition (obtained in step 2)

Query Example: Step 1 Step 1 result: tables = {Account17} columns = {Account17.Beds, Account17.Hospital}

Query Example: Step 2 (Pivot Table) String Int

Query Example: Step 3 (Chunk Table)

Join Overhead Costs Join Overhead No aligning joins

Summary Multi-tenancy database critical to scale SaaS solution Varied schema layout schemes different degrees of consolidation & extensibility optimal layout depends on particular data set, work load, etc. Novel Chunk Table layout wider chunks tend to achieve better performance improvement slows down at some point (e.g., |chunk| = 15) response time using chunking may approach conventional tables Need more work on finding good layout w/ multiple factors