Presentation is loading. Please wait.

Presentation is loading. Please wait.

What is Oracle Database Sharding and What Is It Used For?

Similar presentations


Presentation on theme: "What is Oracle Database Sharding and What Is It Used For?"— Presentation transcript:

1 What is Oracle Database Sharding and What Is It Used For?
Ron Soltani Senior Principal Instructor Oracle University October 2018

2 What is Oracle Database Sharding and What Is It Used For?
NEXT 15-MINUTE BRIEFING What is Oracle Database Sharding and What Is It Used For?

3 Presentation Objectives
Database Sharding 18c Sharding Enhancements Conclusion 2 3 Confidential – Oracle Internal/Restricted/Highly Restricted

4 Presentation Objectives
Database Sharding 18c Sharding Enhancements Conclusion 2 3 Confidential – Oracle Internal/Restricted/Highly Restricted

5 What Is Database Sharding?
Shared-nothing architecture for scalability and availability Horizontally partitioned data across independent databases Loosely coupled data tier without clusterware Server Server A Server B Server C Sharding is a data tier architecture where data is horizontally partitioned across independent databases. Each database in such a configuration is called a shard. All shards together make up a single logical database, which is known as a sharded database or SDB. Horizontal partitioning involves splitting a database table across shards so that each shard contains the table with the same columns but a different subset of rows. The diagram in the slide shows an unsharded table on the left with the rows represented by different colors. On the right, the same table data is shown horizontally partitioned across three shards or independent databases. Each partition of the logical table resides in a specific shard. Such a table is referred to as a sharded table. Sharding is a shared-nothing database architecture because shards do not share physical resources such as CPU, memory, or storage devices. Shards are also loosely coupled in terms of software; they do not run clusterware. From a database administrator’s perspective, an SDB consists of multiple databases that can be managed either collectively or individually. However, from an application developer’s perspective, an SDB looks like a single database: the number of shards and the distribution of data across them are completely transparent to database applications. Sharded table in 3 databases Unsharded table in 1 database Sharded database (SDB) Oracle Database 18c: New Features for Administrators A - 5

6 Sharding: Benefits Extreme scalability by adding shards (independent databases) Fault containment by eliminating single points of failure Global data distribution with the ability to store particular data in a specific shard Rolling upgrades with independent availability of shards Simplicity of cloud deployment with different sized shards Sharding eliminates performance bottlenecks and makes it possible to linearly scale performance and capacity by adding shards. Sharding is a shared-nothing architecture that eliminates single points of failure—such as shared disks, SAN, and clusterware—and provides strong fault isolation. The failure or slowdown of one shard does not affect the performance or availability of other shards. Sharding enables storing particular data close to its consumers and satisfying regulatory requirements when data must be located in a particular jurisdiction. Applying configuration changes on one shard at a time does not affect other shards, and allows administrators to first test changes on a small subset of data. Sharding is well suited to deployment in the cloud. Shards may be sized as required to accommodate whatever cloud infrastructure is available and still achieve required service levels. A sharded database (logical representation) supports up to 1,000 shards (independent databases). Oracle Database 18c: New Features for Administrators A - 6

7 Oracle Sharding: Advantages
Relational schemas Database partitioning ACID properties and read consistency SQL and other programmatic interfaces Complex data types Online schema changes Multicore scalability Advanced security Compression High availability features Enterprise-scale backup and recovery Oracle Sharding provides the benefits of sharding without sacrificing the capabilities of an enterprise RDBMS. Oracle Database 18c: New Features for Administrators A - 7

8 Application Considerations for Sharding
Available only in new database creations Intended for OLTP applications that: Have a well-defined data model and data distribution strategy Have a hierarchical tree structure data model with a single root table Primarily access data by using a sharding key that is stable and with high cardinality Generally access data associated with a single value for the sharding key Use Oracle integrated connection pools (UCP, OCI, ODP.NET, and JDBC) to connect to the sharded database Oracle Sharding is for OLTP applications that are suitable for a sharded database. Existing applications that were never intended to be sharded require some level of redesign to achieve the benefits of a sharded architecture. In some cases, it may be as simple as providing the sharding key; in other cases, it may be impossible to horizontally partition the data and workload as required by a sharded database. Many customer-facing web applications, such as e-commerce, mobile, and social media are well- suited for sharding. Such applications have a well-defined data model and data distribution strategy (hash, range, list, or composite) and primarily access data by using a sharding key. Examples of sharding keys include customer_ID, account_number, and country_id. Applications also usually require partial denormalization of data to perform well with sharding. OLTP transactions that access data associated with a single value of the sharding key are the primary use cases for a sharded database—for example, lookup and update of a customer’s records, subscriber documents, financial transactions, e-commerce transactions, and so on. Because all the rows that have the same value of the sharding key are guaranteed to be on the same shard, such transactions are always single-shard and executed with the highest performance and provide the highest level of consistency. Multi-shard operations are supported, but with a reduced level of performance and consistency. Such transactions include simple aggregations, reporting, and so on, and play a minor role in a sharded application relative to workloads dominated by single-shard OLTP transactions. Oracle Database 18c: New Features for Administrators A - 8

9 Components of Database Sharding
Sharded database (SDB) Shards Global service Shard catalog Shard directors Connection pools Management tools GDSCTL EMCC 13c Shards are independent Oracle databases that are hosted on database servers that have their own local resources: CPU, memory, and disk. No shared storage is required across the shards. A sharded database is a collection of shards. Shards can all be placed in one region or can be placed in different regions. A region in the context of Oracle Sharding represents a data center or multiple data centers that are in close network proximity. All shards of an SDB always have the same database schema and contain the same schema objects. A global service is an extension to the notion of a traditional database service. All the properties of traditional database services are supported for global services. For sharded databases, additional properties are set for global services, for example, database role, replication lag tolerance, region affinity between clients and shards, and so on. For a read/write transactional workload, a single global service is created to access data from any primary shard in an SDB. The shard catalog is an enhanced Global Data Services (GDS) catalog to support Oracle Sharding. A shard director is a specific implementation of a global service manager that acts as a regional listener for clients that connect to an SDB, and maintains a current topology map of the SDB. Oracle supports connection pooling in data access drivers such as OCI, JDBC, ODP.NET, and so on. In Oracle 12c Release 2, these drivers can recognize sharding keys that are specified as part of a connection request. The diagram in the slide shows the typical components of Oracle Sharding. Oracle Database 18c: New Features for Administrators A - 9

10 Shard Catalog Is an enhanced Global Data Services (GDS) catalog containing persistent sharding configuration data Is used to initiate all configuration changes Is used for connections for all DDL commands Contains the master copy of all duplicated tables Replicates changes to duplicated tables by using materialized views Acts as a query coordinator to process multi-shard queries The shard catalog is a special-purpose Oracle Database that is a persistent store for SDB configuration data, and plays a key role in centralized management of a sharded database. All configuration changes, such as adding and removing shards and global services, are initiated on the shard catalog. All DDLs in an SDB are executed by connecting to the shard catalog. The shard catalog also contains the master copy of all duplicated tables in an SDB. It uses materialized views to automatically replicate changes to duplicated tables in all shards. The shard catalog database also acts as a query coordinator that is used to process multi-shard queries and queries that do not specify a sharding key. High availability for the shard catalog can be implemented by using Oracle Data Guard. The availability of the shard catalog has no impact on the availability of the SDB. An outage of the shard catalog affects only the ability to perform maintenance operations or multi-shard queries during the brief period required to complete an automatic failover to a standby shard catalog. OLTP transactions continue to be routed and executed by the SDB, and are unaffected by a catalog outage. Oracle Database 18c: New Features for Administrators A - 10

11 Shard Directors The following are the key capabilities of shard directors: Maintaining runtime data about SDB configuration and availability of shards Measuring network latency between its own and other regions Acting as a regional listener for clients to connect to an SDB Managing global services Performing connection load balancing The global service manager was introduced in Oracle Database 12c to route connections based on database role, load, replication lag, and locality. In support of Oracle Sharding, global service managers have been enhanced to support the routing of connections based on the location of data. A global service manager, in the context of Oracle Sharding, is known as a shard director. A shard director is a specific implementation of a global service manager that acts as a regional listener for clients that connect to an SDB, and maintains a current topology map of the SDB. Based on the sharding key that is passed during a connection request, it routes the connections to the appropriate shard. For a typical SDB, a set of shard directors is installed on dedicated low-end commodity servers in each region. Multiple shard directors should be deployed for high availability. In Oracle Database 12c Release 2, up to five shard directors can be deployed in a given region. Oracle Database 18c: New Features for Administrators A - 11

12 Complete Deployment of a System-Managed SDB
Shardgroup shgrp1 Shard Director Shdir1,2 Shard Catalog shardcat Region Availability_Domain1 Primaries Clients Connection Pools Oracle Sharding is built on the Global Data Services (GDS) architecture. GDS is the Oracle scalability, availability, and manageability framework for multidatabase environments. GDS presents a multi-database configuration to database clients as a single logical database by transparently providing failover, load balancing, and centralized management for database services. GDS routes a client request to an appropriate database based on availability, load, network latency, replication lag, and other parameters. In Oracle Database 12c Release 1, GDS supports only fully replicated databases: it assumes that when a global database service is enabled on multiple databases, all of them contain a full set of data provided by the service. Oracle Database 12c Release 2 extends the concept of a GDS pool to a Sharded GDS pool. Unlike the regular GDS pool that contains a set of fully replicated databases, the sharded GDS pool contains all shards of an SDB and their replicas. For database clients, the sharded GDS pool creates an illusion of a single sharded database, the same way as the regular GDS pool creates an illusion of a single non-sharded database. The diagram in the slide illustrates a typical GDS architecture that has two data centers (APAC, EMEA) and two sets of replicated databases (SALES, HR). The GDS catalog is using Oracle Data Guard between the two regions for high availability. The SALES database is replicated with Active Data Guard. The HR database is replicated with Oracle GoldenGate. HA Standbys Connection Pools Region Availability_Domain2 Shard Director Shdir3,4 Shard Catalog shardcat_stdby Shardgroup shgrp2 Data Guard Fast-Start Failover Oracle Database 18c: New Features for Administrators A - 12

13 Creating Sharded Tables
Use a sharding key (partition key) to distribute partitions across shards at the tablespace level. The NUMBER, INTEGER, SMALLINT, RAW, (N)VARCHAR, (N)CHAR, DATE, and TIMESTAMP data types are supported for the sharding key. SQL> CREATE SHARDED TABLE customers ( CustNo NUMBER NOT NULL , Name VARCHAR2(50) , Address VARCHAR2(250) , CONSTRAINT RootPK PRIMARY KEY(CustNo) ) PARTITION BY CONSISTENT HASH (CustNo) PARTITIONS AUTO TABLESPACE SET ts1; A sharded table is a table that is partitioned into smaller and more manageable pieces among multiple database instances, called shards. Oracle Sharding is implemented based on the Oracle Database partitioning feature. It is essentially distributed partitioning because it extends partitioning by supporting the distribution of table partitions across shards. Partitions are distributed across shards at the tablespace level, based on a sharding key. Each partition of a sharded table resides in a separate tablespace, and each tablespace is associated with a specific shard. Depending on the sharding method, the association can be established automatically or defined by the administrator. Even though the partitions of a sharded table reside in multiple shards, to the application, the table looks and behaves exactly the same as a partitioned table in a single database. The SQL statements that are issued by an application need not refer to shards or depend on the number of shards and their configuration. The slide syntax shows a table that is partitioned by consistent hash, which is a special type of hash partitioning that is commonly used in scalable distributed systems. This technique automatically spreads tablespaces across shards to provide an even distribution of data and workload. The database creates and manages tablespaces as a unit, called a tablespace set. The PARTITIONS AUTO clause specifies that the number of partitions should be automatically determined. This type of hashing provides more flexibility and efficiency in migrating data between shards, which is important for elastic scalability. Oracle Database 18c: New Features for Administrators A - 13

14 Sharded Table Family A set of tables sharded in the same way
Only a single root table (table with no parent) per family Only a single table family per SDB Only a single sharding method (partitioning method) per SDB, that cannot be changed after creation A sharded table family is a set of tables that are sharded in the same way. Parent-child relationships between database tables with a referential constraint in a child table (foreign key) that refers to the primary key of the parent table form a tree-like structure where every child has a single parent. Such a set of tables is referred to as a table family. A table in a table family that has no parent is called the root table. There can be only one root table in a table family. In Oracle Database 12c Release 2, only a single table family is supported in an SDB. Reference partitioning is the recommended way to create a sharded table family. The corresponding partitions of all the tables in the family are stored in the same tablespace set. Partitioning by reference simplifies the syntax because the partitioning scheme is specified only for the root table. Also, partition management operations that are performed on the root table are automatically propagated to its descendants. For example, when adding a partition to the root table, a new partition is created on all its descendants. The partitioning column is present in all tables in the family. This is despite the fact that reference partitioning, in general, allows a child table to be equi- partitioned with the parent table without having to duplicate the key columns in the child table. The reason for this is that reference partitioning requires a primary key in a parent table because the primary key must be specified in the foreign key constraint of a child table that is used to link the child to its parent. However, a primary key on a sharded table must either be the same as the sharding key or contain the sharding key as the leading column. This makes it possible to enforce global uniqueness of a primary key without coordination with other shards, a critical requirement for linear scalability. SQL> CREATE SHARDED TABLE Orders ( OrderNo NUMBER NOT NULL , CustNo NUMBER NOT NULL , OrderDate DATE , CONSTRAINT OrderPK PRIMARY KEY (CustNo, OrderNo) , CONSTRAINT CustFK FOREIGN KEY (CustNo) REFERENCES Customers(CustNo) ) PARTITION BY REFERENCE (CustFK); Oracle Database 18c: New Features for Administrators A - 14

15 Partitions, Tablespaces, and Chunks
Each partition of a sharded table is stored in a separate tablespace. The corresponding data value partitions of all the tables in a table family are always stored in the same shard. Guaranteed when the tables in a table family are created in the same tablespace set The child tables of a table family can be stored in separate tablespace sets. Uses chunks or groups of tablespaces that contain a single partition from each table with the corresponding partitions in the family Distribution of partitions across shards is achieved by creating partitions in tablespaces that reside on different shards. Each partition of a sharded table is stored in a separate tablespace, making the tablespace the unit of data distribution in an SDB. To minimize the number of multi-shard joins, the corresponding partitions of all the tables in a table family are always stored in the same shard. This is guaranteed when the tables in a table family are created in the same set of distributed tablespaces as shown in the syntax examples for this lesson, where the tablespace set ts1 is used for all tables. However, it is possible to create different tables from a table family in different sets of tablespaces, for example, the Customers table in the tablespace set ts1 and Orders in the tablespace set ts2. In this case, it must be guaranteed that the tablespace that stores partition 1 of Customers always resides in the same shard as the tablespace that stores partition 1 of Orders. To support this functionality, a set of corresponding partitions from all the tables in a table family, called a chunk, is formed. A chunk contains a single partition from each table of a table family. The illustration in the slide shows a chunk that contains corresponding partitions from the tables of the Customers-Orders-LineItems schema. Oracle Database 18c: New Features for Administrators A - 15

16 Sharding Methods: System-Managed Sharding
Data is automatically distributed across shards using partitioning by consistent hash. System-managed sharding is a sharding method that does not require the user to specify a mapping of data to shards. Data is automatically distributed across shards using partitioning by consistent hash. The partitioning algorithm evenly and randomly distributes data across shards. The distribution used in system-managed sharding is intended to eliminate hot spots and provide uniform performance across shards. Oracle Sharding automatically maintains balanced distribution of data when shards are added to or removed from an SDB. Consistent hash is a partitioning strategy that is commonly used in scalable distributed systems. It is different from traditional hash partitioning. With traditional hashing, the bucket number is calculated as HF(key) % N where HF is a hash function and N is the number of buckets. This approach works fine if N is constant, but requires reshuffling of all data when N changes. More advanced algorithms, such as linear hashing, do not require rehashing of the entire table to add a hash bucket, but they impose restrictions on the number of buckets, such as the number of buckets can only be a power of 2, and on the order in which the buckets can be split. The implementation of consistent hashing that is used in Oracle Sharding avoids these limitations by dividing the possible range of values of the hash function (for example, from 0 to 232) into a set of N adjacent intervals, and assigning each interval to a chunk. In this example, the SDB contains chunks, and each chunk gets assigned a range of 222 hash values. Therefore, partitioning by consistent hash is essentially partitioning by the range of hash values. Oracle Database 18c: New Features for Administrators A - 16

17 Sharding Methods: Composite Sharding
Data is first partitioned by list or range across multiple shardspaces, and then further partitioned by consistent hash across multiple shards in each shardspace. The composite sharding method allows you to create multiple shardspaces for different subsets of data in a table partitioned by consistent hash. A shardspace is a set of shards that stores data that corresponds to a range or list of key values. System-managed sharding does not give you any control over the assignment of data to shards. When sharding by consistent hash on a primary key, there is often a requirement to differentiate subsets of data within an SDB in order to store them in different geographic locations, allocate to them different hardware resources, or configure high availability and disaster recovery differently. Usually this differentiation is done based on the value of another (non-primary) column, for example, customer location or a class of service. With composite sharding, data is first partitioned by list or range across multiple shardspaces, and then further partitioned by consistent hash across multiple shards in each shardspace. The two levels of sharding make it possible to automatically maintain a balanced distribution of data across shards in each shardspace, and at the same time, partition data across shardspaces. The slide illustration shows two tablespace sets: tbs1 at the top and tbs2 at the bottom. Tablespace set tbs1 is labeled “Shardspace for GOLD customers - shspace1” and contains three shards, each of which contains a range of tablespaces and their respective partitions. Tablespace set tbs2 is labeled “Shardspace for SILVER customers - shspace2” and contains four shards, each of which contains a range of tablespaces and their respective partitions. Oracle Database 18c: New Features for Administrators A - 17

18 Duplicated Tables Are nonsharded tables that duplicate data on all shards Help eliminate cross-shard queries Are created in the shard catalog Use materialized view replication Can be refreshed by using a refresh frequency (default 60 seconds) that is set with the SHRD_DUPL_TABLE_REFRESH_RATE initialization parameter Cannot be stored in tablespaces used for sharded tables In addition to sharded tables, an SDB can contain tables that are duplicated on all shards. For many applications, the number of database requests handled by a single shard can be maximized by duplicating read-only or read-mostly tables across all shards. This strategy is a good choice for relatively small tables that are often accessed together with sharded tables. A table with the same contents in each shard is called a duplicated table. Oracle Sharding synchronizes the contents of duplicated tables by using Materialized View Replication. A duplicated table on each shard is represented by a read-only materialized view. The master table for the materialized views is located in the shard catalog. The CREATE DUPLICATED TABLE statement automatically creates the master table, materialized views, and other objects required for materialized view replication. The materialized views on all the shards are automatically refreshed at a configurable frequency. The refresh frequency of all duplicated tables is controlled by the SHRD_DUPL_TABLE_REFRESH_RATE database initialization parameter. The default value for the parameter is 60 seconds. SQL> CREATE DUPLICATED TABLE Products ( StockNo NUMBER PRIMARY KEY , Description VARCHAR2(20) , Price NUMBER(6,2)); Oracle Database 18c: New Features for Administrators A - 18

19 Routing in an Oracle Sharded Environment
Direct Routing based on sharding_key For OLTP workloads that specify sharding_key (for example, customer_id) during connect Enabled by enhancements to mid-tier connection pools and drivers Proxy Routing via a coordinator (shard catalog) For workloads that cannot specify sharding_key (as part of a connection) For reporting, batch jobs Queries spanning one or more or all shards Direct Routing: In the first case (the first bullet point), a transaction happens on a single shard. In the second case (second bullet point), JDBC/UCP, OCI, and ODP.NET recognize the sharding keys. Proxy Routing: In the last case (last bullet point), the queries perform in parallel across shards (for example, aggregates on sales data). Oracle Database 18c: New Features for Administrators A - 19

20 Direct Routing via Sharding Key
App Tier Connection Pool Routing Tier The sharding keys are provided by the applications at connection checkout. The client specifies the sharding key (for example, customer_id). The shard director looks up the key, and redirects the client to the shard database that contains the data. The client executes the SQL directly on the shard. Shard Directors Data Tier Oracle Database 18c: New Features for Administrators A - 20

21 Connection Pool as Shard Director
App Tier Connection Pool Routing Tier Fast Path for Key Access Upon first connection to a shard: The connection pool retrieves all the key ranges in the shard The connection pool caches the key range mappings The database request for a key that is in any of the cached key ranges goes directly to the shard (that is, bypasses the shard director). Shard Directors Data Tier Oracle Database 18c: New Features for Administrators A - 21

22 Proxy Routing: Limited to System Managed in 12.2.0.1
App Tier Connection Pool Non-Sharding Key Access and Multi-Shard Queries A connection is made to the coordinator: Applications connect to the catalog service via a separate connection pool. The coordinator parses the SQL and will proxy/route the request to the correct shard. The same flow is used for multi-shard queries: The coordinator acts as the SQL proxy/router. Shard pruning and scatter-gather are supported. This feature is for developer convenience and not for high performance. Routing Tier Coordinator (shard catalog) Shard Directors Data Tier Oracle Database 18c: New Features for Administrators A - 22

23 Lifecycle Management of SDB
The DBA can manually move or split a chunk from one shard to another. When a new shard is added, chunks are automatically rebalanced. Before a shard is removed, chunks must be manually moved. Connection pools are notified (via ONS) about a split, a move, addition or removal of shards, auto-resharding, and read-only access operations. All shards can be patched with one command via opatchauto. EM supports monitoring and management of SDB. In the second case (the second bullet point), RMAN incremental backup and transportable tablespace are used. In the fourth case (the fourth bullet point), the application can either reconnect or access read-only. Oracle Database 18c: New Features for Administrators A - 23

24 Sharding Deployment Outline: DBA Steps
Create users and groups on all host servers. Perform an Oracle-database-software-only installation on the shard catalog server and save a response file. Perform silent installations of the Oracle database software only on all the shard hosts and the additional shard catalog host. Install the Oracle Global Service manager software on all the shard director hosts. Create a non-container database by using DBCA on the shard catalog host with Oracle Managed Files (required). Configure the remote scheduler on the shard catalog host. Register the remote scheduler on each shard host with the shard catalog host. Oracle Sharding architecture uses separate server hosts for the shard catalog, shard directors, and shards. The number of shards supported in a given sharded database (SDB) is 1,000. Deploying a sharded database can be a lengthy process because the Oracle software is installed separately on each server host. The slide presents a very high-level overview of the steps that are necessary to deploy Oracle Sharding. For detailed information, see Oracle Database Administrator’s Guide 12c Release 2 (12.2). Oracle Database 18c: New Features for Administrators A - 24

25 Sharding Deployment Outline: DBA Steps
Use GDSCTL on the shard catalog host to create a shard catalog. Use GDSCTL on the shard catalog host to create and start the shard directors. Create additional shard catalogs in a different region for high availability. Define the primary shardgroup (region) by using GDSCTL connected to the shard director host. Define the Active Data Guard standby shardgroup by using GDSCTL connected to the shard director host. Define each shard host as belonging to the primary or standby shardgroup. The slide continues with the very high-level overview of the steps that are necessary to deploy Oracle Sharding. For detailed information, see Oracle Database Administrator’s Guide 12c Release 2 (12.2). Oracle Database 18c: New Features for Administrators A - 25

26 Sharding Deployment Outline: DBA Steps
Use GDSCTL connected to the shard director host to run the DEPLOY command, which: Creates all primary and standby shard databases using DBCA Enables archiving and flashback for all shards Configures Data Guard Broker with Fast-Start Failover enabled Starts observers on the standby group’s shard director Use GDSCTL to add and start a global service that runs on all primary shards. Use GDSCTL to add and start a global service for read-only workloads on all standby shards. Use SQL*Plus connected to the shard catalog database to design the sharded schema model (developer steps). The slide continues with the very high-level overview of the steps that are necessary to deploy Oracle Sharding. For detailed information, Oracle Database Administrator’s Guide 12c Release 2 (12.2). Oracle Database 18c: New Features for Administrators A - 26

27 Presentation Objectives
Database Sharding 18c Sharding Enhancements Conclusion 2 3 Confidential – Oracle Internal/Restricted/Highly Restricted

28 System-Managed and Composite Sharding Methods
Only two methods are supported in 12.2: System-Managed Sharding: Data is automatically distributed across shards using partitioning by consistent hash. Composite Sharding: Data is first partitioned by list or range across multiple shardspaces, and then further partitioned by consistent hash across multiple shards in each shardspace. Introduced in Oracle Database 12c release , Oracle Sharding provided two methods of sharding data: System-managed sharding Composite Sharding System-managed sharding is a sharding method that does not require the user to specify a mapping of data to shards. Data is automatically distributed across shards using partitioning by consistent hash. The partitioning algorithm evenly and randomly distributes data across shards. The distribution used in system-managed sharding is intended to eliminate hot spots and provide uniform performance across shards. Oracle Sharding automatically maintains balanced distribution of data when shards are added to or removed from an SDB. Consistent hash is a partitioning strategy that is commonly used in scalable distributed systems. It is different from traditional hash partitioning. With traditional hashing, the bucket number is calculated as HF(key) % N, where HF is a hash function and N is the number of buckets. This approach works well if N is constant, but requires reshuffling of all data when N changes. More advanced algorithms, such as linear hashing, do not require rehashing of the entire table to add a hash bucket, but they impose restrictions on the number of buckets (such as it can only be a power of 2), and on the order in which the buckets can be split. The implementation of consistent hashing that is used in Oracle Sharding avoids these limitations by dividing the possible range of values of the hash function (for example, from 0 to 232) into a set of N adjacent intervals, and assigning each interval to a chunk. In this example, the SDB contains chunks, and each chunk gets assigned a range of 222 hash values. Therefore, partitioning by consistent hash is essentially partitioning by the range of hash values. Oracle Database 18c: New Features for Administrators

29 User-Defined Sharding Method
18c This method enables users to define LIST- or RANGE-based sharding. Oracle Database 18c introduces the user-defined sharding method that lets you explicitly specify the mapping of data to individual shards. It is used when, because of performance, regulatory, or other reasons, certain data needs to be stored on a particular shard, and the administrator must have full control over moving data between shards. Another advantage of user-defined sharding is that, in case of planned or unplanned outage of a shard, you know exactly what data is not available. The disadvantage of user-defined sharding is the need for the database administrator to monitor and maintain balanced distribution of data and workload across shards. With user-defined sharding, a sharded table can be partitioned by range or list. There is no tablespace set defined for user-defined sharding. Each tablespace has to be created individually and explicitly associated with a shardspace. A shardspace is a set of shards that store data that corresponds to a range or list of key values. As with system-managed sharding, tablespaces created for user-defined sharding are assigned to chunks. However, no chunk migration is automatically started when a shard is added to the SDB. The user needs to execute the MOVE CHUNK command for each chunk that needs to be migrated. GDSCTL CREATE SHARDCATALOG supports user-defined sharding with the value USER in the – sharding option The SPLIT CHUNK command, which is used to split a chunk in the middle of the hash range for system- managed sharding, is not supported for user-defined sharding. You must use the ALTER TABLE SPLIT PARTITION statement to split a chunk. SQL> CREATE SHARDED TABLE accounts (id NUMBER, account_nb NUMBER, cust_id NUMBER, branch_id NUMBER, state VARCHAR(2), status VARCHAR2(1)) PARTITION BY LIST (state) ( PARTITION p_northwest VALUES ('OR', 'WA') TABLESPACE ts1, PARTITION p_northeast VALUES ('NY', 'VM', 'NJ') TABLESPACE ts5, PARTITION p_southeast VALUES ('FL', 'GA') TABLESPACE ts6 ); Oracle Database 18c: New Features for Administrators

30 Support for PDBs as Shards
12c Sharded databases must consist of sharding catalogs and shards that can be: Single-instance database Oracle RAC–enabled stand-alone databases CDBs not supported A shard and shard catalog can be a single PDB in a CDB. GDSCTL ADD SHARD command includes the –cdb option. New GDSCTL commands: ADD CDB, MODIFY CDB, REMOVE CDB, CONFIG CDB To support consolidation of databases on under-utilized hardware, for ease of management, or geographical business requirements, you can use single PDBs in CDBs as a shard database. The GDSCTL command ADD SHARD is extended with the –cdb option, and new commands ADD CDB, MODIFY CDB, CONFIG CDB, and REMOVE CDB are implemented so that Oracle Sharding can support a multitenant architecture. The GDSCTL command ADD CDB is used to add a pre-created CDB to the shard catalog. The GDSCTL ADD SHARD command, extended with the -cdb option in 18c, is used to add shards, which are PDBs contained within a CDB to the sharded database upon deployment. Use the MODIFY CDB command to change the metadata of the CDB in the shard catalog. Use the REMOVE CDB command to remove a CDB from the shard catalog. Removing a CDB does not destroy it. Use the CONFIG CDB command to display information about the CDB in the shard catalog. Oracle Data Guard supports replication only at the CDB level. The existing sharding architecture allows replicated copies of the sharded data for high availability, and you can optionally configure and use Data Guard to create and maintain these copies. Data Guard does not currently support replication at the PDB level; it can only replicate an entire container. Information about migrating single-instance shards to PDBs can be found in the Oracle Database Using Oracle Sharding 18c guide in Oracle Help Center. 18c GDSCTL> ADD CDB –connect db11 –pwd GSMUSER_password GDSCTL> ADD SHARD –cdb db11 -connect connect_string –shardgroup shgrp1 -deploy_as active_standby -pwd GSMUSER_password Oracle Database 18c: New Features for Administrators

31 Improved Oracle GoldenGate Support
Split chunks not supported Split chunk support Automatic CDR support of tables with unique indexes/constraints 18c Enhancements in Oracle GoldenGate 13c were introduced to provide support for Oracle Sharding high availability, but there were some limitations. In 18c GoldenGate now supports the GDSCTL SPLIT CHUNK command. Auto CDR was introduced in Oracle Database 12.2 (and Oracle GoldenGate 12.3) to automate the conflict detection and resolution configuration in active-active GoldenGate replication setups. However, Auto CDR was allowed only on tables with primary keys. In Oracle Database 18c, this restriction is relaxed and Auto CDR is supported on tables with just unique keys/indexes but no primary keys. Oracle Database 18c: New Features for Administrators

32 Query System Objects Across Shards
Shards managed individually No aggregate views from all shards SHARDS() clause and shard_id Query performance views Statistics collection 12c 18c SQL> SELECT sql_text, shard_id FROM SHARDS(sys.v$sql) a WHERE a.sql_id = '1234'; In Oracle Database 12c Release 2, to perform maintenance operations, you had to go to each database individually. Easy, centralized diagnostics collection from all of the shards was not available. With Oracle Database 18c, you can use the SHARDS() clause to query Oracle-supplied tables to gather performance, diagnostic, and audit data from V$ views and DBA_* views. The shard catalog database can be used as the entry point for centralized diagnostic operations using the SQL SHARDS() clause. The SHARDS() clause allows you to query the same Oracle supplied objects, such as V$, DBA/USER/ALL views and dictionary objects and tables, on all of the shards and return the aggregated results. As shown in the examples, an object in the FROM part of the SELECT statement is wrapped in the SHARDS() clause to specify that this is not a query to a local object, but to objects on all shards in the sharded database configuration. A virtual column called SHARD_ID is automatically added to a SHARDS()-wrapped object during execution of a multi-shard query to indicate the source of every row in the result. The same column can be used in predicate for pruning the query. A query with the SHARDS() clause can be executed only on the shard catalog database. SQL> SELECT shard_id, callspersec FROM SHARDS(v$servicemetric) WHERE service_name LIKE 'oltp%' AND group_id = 10; SQL> SELECT table_name, partition_name, blocks, num_rows FROM SHARDS(dba_tab_partition) p WHERE p.table_owner = :1; Oracle Database 18c: New Features for Administrators

33 Consistency Levels for Multi-Shard Queries
Multi-shard queries always used SCN synchronization and were resource intensive New initialization parameter: MULTISHARD_QUERY_DATA_CONSISTENCY 12c 18c SQL> ALTER SYSTEM SET MULTISHARD_QUERY_DATA_CONSISTENCY = delayed_standby_allowed SCOPE=SPFILE; You may want to specify different data consistency levels for some multi-shard queries, because, for example, it may be desirable for some queries to avoid the cost of SCN synchronization across multiple shards and these shards could be globally distributed. Another use case is when you are using standbys for replication and it is acceptable to have slightly stale data for multi-shard queries, the results could be fetched from the primary or its standbys. A new user-visible database parameter, multishard_query_consistency_level, has been added in Oracle Database 18c to specify consistency level for multi-shard queries. The parameter can have one of the following values: strong (default): With this setting, SCN synchronization is performed across all shards, and data is consistent across all shards. This setting provides global consistent read capability. This is the default value. shard_local: With this setting, SCN synchronization is not performed across all shards. Data is consistent within each shard. This setting provides the most current data. delayed_standby_allowed: With this setting, SCN synchronization is not performed across all shards. Data is consistent within each shard. This setting allows data to be fetched from Data Guard standby databases when possible (for example, depending on load balancing), and may return stale data from standby databases. The default mode is strong, which performs SCN synchronization across all shards. Other modes skip SCN synchronization. The delayed_standby_allowed mode allows fetching data from the standbys as well, depending on load balancing and do on and thus may have stale data. This parameter can be set either at the system level or at the session level. See the Oracle Database Reference Guide for more information about MULTISHARD_QUERY_DATA_CONSISTENCY usage. Oracle Database 18c: New Features for Administrators

34 Sharding Support for JSON, LOBs, and Spatial Objects
System-managed sharded databases: Create tablespace set for LOBs Include tablespace set for LOBs in parent table CREATE SQL> CREATE TABLESPACE SET lobtss1; LOB is a widely used, first class data type in Oracle Database. Release 18c enables the use of LOBs, JSON, and spatial objects in an Oracle Sharding environment, which is useful for applications that use these data types where storage in sharded tables would facilitate business requirements. JSON operators that generate temporary LOBs, large JSON documents (those that require LOB storage), spatial objects, index and operators, and persistent LOBs can be used in an Oracle Sharding environment. The following interfaces are new or changed as part of this feature. This release enables JSON operators that generate temporary LOBs, large JSON documents (those that require LOB storage), spatial objects, index and operators, and persistent LOBS to be used in a sharded environment. In a system-managed sharded database, you must specify a tablespace set for the LOBs, and then include it in the CREATE SHARDED TABLE statement for the parent table as shown in the examples here. SQL> CREATE SHARDED TABLE customers (CustId VARCHAR2(60) NOT NULL, … image BLOB, CONSTRAINT pk_customers PRIMARY KEY (CustId), CONSTRAINT json_customers CHECK (CustProfile IS JSON)) TABLESPACE SET TSP_SET_1 LOB(image) STORE AS (TABLESPACE SET LOBTSS1) PARTITION BY CONSISTENT HASH (CustId) PARTITIONS AUTO; Oracle Database 18c: New Features for Administrators

35 Sharding Support for JSON, LOBs, and Spatial Objects
Composite sharded databases: Create tablespace sets for LOBs Include tablespace sets for LOBs in parent table CREATE SQL> CREATE TABLESPACE SET LOBTSS1 IN SHARDSPACE cust_america ... ; SQL> CREATE TABLESPACE SET LOBTSS2 IN SHARDSPACE cust_europe ... ; In a composite sharded database, you must specify a tablespace set for each shardspace for the LOBs, and then include them in the CREATE SHARDED TABLE statement for the parent table as shown in the examples in the slide. SQL> CREATE SHARDED TABLE customers ( CustId VARCHAR2(60) NOT NULL, … image BLOB, CONSTRAINT pk_customers PRIMARY KEY (CustId), CONSTRAINT json_customers CHECK (CustProfile IS JSON)) PARTITIONSET BY LIST (GEO) PARTITION BY CONSISTENT HASH (CustId) PARTITIONS AUTO (PARTITIONSET america VALUES ('AMERICA') TABLESPACE SET tsp_set_1 LOB(image) STORE AS (TABLESPACE SET LOBTSS1), PARTITIONSET europe VALUES ('EUROPE') TABLESPACE SET tsp_set_2 LOB(image) STORE AS (TABLESPACE SET LOBTSS2)); Oracle Database 18c: New Features for Administrators

36 Sharding Support for JSON, LOBs, and Spatial Objects
User-defined sharded databases: Create tablespace sets for LOBs Include tablespaces for LOBs in parent CREATE table SQL> CREATE TABLESPACE lobts1 … IN SHARDSPACE shspace1; SQL> CREATE TABLESPACE lobts2 … in shardspace shspace2; In a user-defined sharded database, you must specify a tablespace, not a tablespace set, for each shardspace for the LOBs, and then include them in the CREATE SHARDED TABLE statement for the parent table as shown in the examples in the slide. SQL> CREATE SHARDED TABLE customers (CustId VARCHAR2(60) NOT NULL, … image BLOB, CONSTRAINT pk_customers PRIMARY KEY (CustId), CONSTRAINT json_customers CHECK (CustProfile IS JSON)) PARTITION BY RANGE (CustId) ( PARTITION ck1 values less than ('m') tablespace ck1_tsp LOB(image) store as (TABLESPACE LOBTS1), PARTITION ck2 values less than (MAXVALUE) tablespace ck2_tsp LOB(image) store as (tablespace LOBTS2)); Oracle Database 18c: New Features for Administrators

37 Improved Multi-Shard Query Support
12c There are restrictions on query shapes. Only system-managed sharding is supported. All query shapes supported System-managed, user-defined, and composite sharding methods supported Centralized execution plan display available Oracle supplied objects in queries Multi-column sharding keys supported SET operators supported 18c In Oracle Database 12.2, there were several restrictions on the query shapes that could be used on queries over multiple shards, and multi-shard queries were supported only in sharded databases using the system- managed sharding method. The restrictions lifted in Oracle Database 18c are: Support for composite and user-defined sharding Multi-shard query execution plan display Support for all query shapes like views, subqueries, joins on non-sharding column, and so on. Support for Oracle-supplied tables/views (using SHARDS() clause and SHARD_ID) and PL/SQL functions. Support for multi-column sharding keys. Use of SET operators Oracle Database 18c: New Features for Administrators

38 Oracle Sharding Documentation
Oracle Sharding documentation contained in Oracle Database Administrator’s Guide, Part VII “Sharded Database Management”. Oracle Sharding documentation has its own book, Oracle Database Using Oracle Sharding, included in Oracle Database documentation library in Oracle Help Center. 18c In Oracle Database 18c, the Oracle Sharding documentation has been moved from part seven of the Oracle Database Administrator’s Guide to its own new book, called Oracle Database Using Oracle Sharding, in the Oracle Database documentation library in Oracle Help Center. Oracle Database 18c: New Features for Administrators

39 Presentation Objectives
Database Sharding 18c Sharding Enhancements Conclusion 2 3 Confidential – Oracle Internal/Restricted/Highly Restricted

40 Experience Oracle University Learning Subscriptions. Visit education
Experience Oracle University Learning Subscriptions! Visit education.oracle.com/oowtrial Free Trial Subscription: Special invitation from Oracle University to attendees of Oracle OpenWorld or Code One Anytime, anywhere access Continually updated training on Oracle products and technologies. Experience the new Unlimited Product Learning Subscription Instructor notes Details: UPLS Trial: 1 subscription per attendee Ends December 21, 2018 or after attendee consumes 5 hours of learning on the trial subscription. Availability: Go to the education.oracle.com/oowtrial to activate your trial subscription. Oracle Confidential – Internal/Restricted/Highly Restricted

41 Keep Learning with Oracle University
1300+ Training Courses 2000+ Courses: Cloud Technology Applications Industries 200,000 Students trained per year 450+ Training On Demand courses 20 Learning Subscriptions 2 Million Oracle Certified Professionals 500+ Education Partnerships education.oracle.com

42 Experience Oracle University Learning Subscriptions. Visit education
Experience Oracle University Learning Subscriptions! Visit education.oracle.com/oowtrial Free Trial Subscription: Special invitation from Oracle University to attendees of Oracle OpenWorld or Code One Anytime, anywhere access Continually updated training on Oracle products and technologies. Experience the new Unlimited Product Learning Subscription Instructor notes Details: UPLS Trial: 1 subscription per attendee Ends December 21, 2018 or after attendee consumes 5 hours of learning on the trial subscription. Availability: Go to the education.oracle.com/oowtrial to activate your trial subscription. Oracle Confidential – Internal/Restricted/Highly Restricted

43 Are You Up For the Oracle University Zip Labs Challenge at Code One?
Join us in San Francisco, California at the Moscone West Center where you can compete to win a prize at the “Oracle University Zip Labs Challenge Booth.” When: Monday, October 26th – Open from 9:00am through 4:30pm Tuesday, October 27th – Open from 9:00am through 4:30pm Wednesday, October 28th – Open from 9:00am through 3:00pm What is the Oracle University Zip Labs Challenge? The Oracle University Zip Labs Challenge is a collection of labs, each minute long. Zip Labs guide you through a sequence of steps to accomplish a specific task within the Oracle Cloud Platform. It’s an opportunity to get started experiencing for yourself how some of Oracle’s new technologies work. You can select from labs in the categories covering: Virtual Machines: Creating a VM in OCI Autonomous Data Warehouse (ADW): Provisioning, Connecting to SQL, Machine Learning Autonomous Transaction Processing (ATP): Provisioning, Connecting to SQL, Scaling Great Learning. Great Technology. Great Prizes COME SEE WHAT ALL THE EXCITEMENT IS ABOUT AS YOU WORK THROUGH EXPERT DEVELOPED LABS AND CLIMB HIGHER ON OUR LEADERBOARD THROUGHOUT THE DAY – COMPETING WITH OTHER CONTESTANTS It’s simple to find us. Go to the 2nd floor of Moscone West.  As you complete labs and quizzes, you’ll earn points to boost your leaderboard standing.  At the end of each day, the top 5 winners win a fabulous prize.  So if you are up for the challenge – then we hope you drop by to showcase your skills and curiosity! Looking forward to seeing you there.  Confidential – Oracle Internal/Restricted/Highly Restricted

44 Thank You We would like to thank you for taking the time and attending our presentation MySQL for Database Administrators

45


Download ppt "What is Oracle Database Sharding and What Is It Used For?"

Similar presentations


Ads by Google