Download presentation
Presentation is loading. Please wait.
Published byCláudia Deluca Rios Modified over 6 years ago
1
Relaxed Currency and Consistency: How to Say “Good Enough” in SQL
Hongfei Guo University of Wisconsin Per-Åke Larson Microsoft Research Raghu Ramakrishnan University of Wisconsin Jonathan Goldstein Microsoft Research Good afternoon! My name is Hongfei Guo. I am from Univ. of Wisconsin. Today I am going to talk about supporting relaxed C&C in replica environment. This is a joint work with my advisor Raghu Ramakeishnan, and Paul Larson and Jonathan Goldstein from MSR.
2
Middle-tier Database Caching Scenario
Log Reader Distributor (SQL Server Replication) Backend Database Replicate data from backend (SQL Server) Update propagation Remote queries & updates Caching Database To motivate this work, let’s look at an example of middle-tier database caching. In this scenario, we have backend database. For the sake of availability, scalability and performance, Some caching databases are deployed between the backend server and the application server. The cache replicates part of the data from the backend. Applications submit queries or updates to the cache. For queries, If required data are available in the cache, the query can be answered locally. Otherwise, remote queries are sent to the backend. For updates, the cache always forwards them to the backend. Changes are propagated to the cache asynchronously. Obvious data in the cache can be out of date. What is the problem here? Submit queries & updates to the cache Caching Database Application Server (IIS) Application Server (IIS) Application Server (IIS) (Figure from [LGZ, ICDE04]) 30-Nov-18 Hongfei Guo Univ. of Wisconsin
3
Hongfei Guo Univ. of Wisconsin
Problem: How to tell whether the cached data is “good enough” for an application? NO data freshness requirements from the apps! NO data freshness guarantees from the caching DBMS! In this particular scenario, and any replication scenario where asynchronous update is used, we have the problem of data freshness for the application need. Currently, there is no way for applications to specify their freshness requirement. And consequently, the cache does not know what level of guarantees to provide 30-Nov-18 Hongfei Guo Univ. of Wisconsin
4
Application (e.g., CNN, eBay)
Big Picture Application (e.g., CNN, eBay) Replica Caching DBMS Caching Middle-ware Application Specific Caching Nowadays, multi-tier architecture is a common practice, and applications use replica routinely. However, in many cases, cache middleware or application specific code are used to manage the replica and answer queries from the application. Why off-shelf caching DBMS is not commonly used? Because currently DBMS lacks of understanding of applications special needs for replicas! This is analogous to the situation that DBMS uses its own buffer pool instead of virtual memory provided by the OS. If the caching DBMS can understand the applications more, then more common tasks can be moved to DBMS. In a sense, this work is the first step toward this ambitious goal. Why middle-ware, why application specific code? Becaude DBMS does not provide enough for application need. This work is the first step to open the door of communication to Apps. Three drawbacks: People do this, all hand-crafted. They require modification on the applications, eg, to route the queries to the desired replica whenever the underlining strategy changes. No guarantees on data quality. (you get what you get) Backend DBMS 30-Nov-18 Hongfei Guo Univ. of Wisconsin
5
First query-centric approach in DBMS!
Our Contributions Allow queries to specify relaxed currency and consistency (C&C) constraints Extend SQL to support C&C constraints Semantics of C&C constraints Efficiently enforce C&C constraints in caching DBMS Prototyped in SQL Server (MTCache) Experiments show only small overhead First query-centric approach in DBMS! How do we solve the problem? Well, From the app.’s side, we … To this end From the DBMS side, we propose techniques to efficiently enforce such constraints. … To this end, we… Experiments show only small run-time overhead Here I want to emphasize that compared to previous work, we take a query-centric approach. That is, Instead of only provide an average guarantee to the whole workload, we allow each individual query to specify its own C&C requiremenst, and we provide DBMS guarantee to those requirements. Ok. Here I used the term currency and consistency, what do I mean by that. 30-Nov-18 Hongfei Guo Univ. of Wisconsin
6
Terminologies (informal)
Currency: The elapsed time since this copy became stale Consistency: A query result is (snapshot) consistent iff it is as if evaluated from a snapshot of the master database C&C: Currency & Consistency Intuitively, currency is used to describes how old an object is. We define it as … And consistency is used to describe the relationship among a group of objects. 30-Nov-18 Hongfei Guo Univ. of Wisconsin
7
Hongfei Guo Univ. of Wisconsin
Roadmap Background Expressing C&C constraints Enforcing C&C constraints Experiments and analysis Conclusion & future work Ok. I have described the background of this work. In the rest of the talk. I will explain how to express and enforce C&C constraints. I will also show you some experiment results and conclusion. Next, how to express C&C constraints. Before we answer the question, we need to know what kind of requirements are desirable for applications. We allow each query to specify two types of data quality requirements: currency requirement and consistency requirement. I will illustrate them by examples. 30-Nov-18 Hongfei Guo Univ. of Wisconsin
8
Currency Requirements
Example 1: In mid-tier caching setting, the caching database keeps Books info Customer A is browsing –it is ok if the data is no more than 3 days out of sync (Quick response time is preferred) Customer B is about to purchase –he wants the data to be exactly current (High data quality is preferred) Let’s first look at currency requirement. In this example, the middle tier cache has Books table. Customer A…. For him, quick response time is preferred. In contrast, Customer B… For him, … This example showed that … diff currency reqt. This argue for our query-centric approach. 30-Nov-18 Hongfei Guo Univ. of Wisconsin
9
Currency Requirements
Example 1: In mid-tier caching setting, the caching database keeps Books info Customer A is browsing –it is ok if the data is no more than 3 days out of sync (Quick response time is preferred) Customer B is about to purchase –he wants the data to be exactly current (High data quality is preferred) 30-Nov-18 Hongfei Guo Univ. of Wisconsin
10
Currency Requirements
Example 1: In mid-tier caching setting, the caching database keeps Books info Customer A is browsing –it is ok if the data is no more than 3 days out of sync (Quick response time is preferred) Customer B is about to purchase –he wants the data to be exactly current (High data quality is preferred) Different apps may have different currency requirements for the same query 30-Nov-18 Hongfei Guo Univ. of Wisconsin
11
Consistency Requirements
Example 2: SELECT * FROM Books B, Reviews R WHERE B.bid = R.bid AND B.title = “Databases“ Ullman databases 2 Raghu 1 author title bid Books … 3 text rid Reviews Different apps may have different consistency requirements for the same query The whole query result be consistent Each book be consistent with its reviews Books be consistent & Reviews be consistent In this example, we have books and reviews in the cache. The query asks for all the database books and their reviews. The incentive for having relaxed consistency requirements is not as intuitive as in currency case. For now, just keep in mind, everything else being equal, the more relaxed an query consistency requirement is, the more likely that local data can be used to answer this query. Having said that, let’s return to the example. User A is about to purchase, he want … Here, high data quality is preferred. User B is browsing, he doesn’t mind if the reviews are a little older than the books, so Books and reviews don’t have to be mutually consistent. User C is also browsing, he want all the rows about the same book to be consistent. But rows for different books don’t have to be mutually consistent. For both B and C, quick response time is preferred. Here, we see… Again, it argue for our query-centric approach. Ok. We see indeed a query has the need to specify its customized C&C reqts. Next, I will explain how to by example. bid title author rid text 1 databases Raghu … 2 Ullman 3 30-Nov-18 Hongfei Guo Univ. of Wisconsin
12
Hongfei Guo Univ. of Wisconsin
Proposed SQL Syntax Ullman databases 2 Raghu 1 author title bid Books … 3 text rid Reviews SELECT * FROM Books B, Reviews R WHERE B.bid = R.bid AND B.title = “Databases“ Consistent class Currency bound Group by CURRENCY BOUND 10 min ON (B, R) BY B.bid CURRENCY BOUND 10 min ON (B, R) CURRENCY BOUND 10 min ON (B), 30 min ON (R) We extend SQL syntax to include a currency clause for each query block. There are three components in this clause. The pair of parenthesis So this currency clause says: the whole… Let’s look at another example: This currency clause says: In the query results, In this example, we see the third component: group by phrase. It can be added to a consistency class to specify that the scope for the consistency class is only within each group. This currency clause says, if we group the query result by book id, then each group has to be consistent, but different groups don’t have to be mutually consistent. bid title author rid text 1 databases Raghu … 2 Ullman 3 30-Nov-18 Hongfei Guo Univ. of Wisconsin
13
Hongfei Guo Univ. of Wisconsin
Roadmap Background Expressing C&C constraints Enforcing C&C constraints Experiments and analysis Conclusion & futurework Ok, now with the currency clause, a query can specify its C&C constaints. Next I will explain how we enforce such constraints. 30-Nov-18 Hongfei Guo Univ. of Wisconsin
14
Extension to MTCache Framework MTCache Framework [LGZ04]
Queries with Relaxed C&C Requirements Queries Shadow Databases Query Optimizer Currency Region Metadata Local Materialized Views Execution Engine Heartbeat Tables We integrate C&C checking into MTCache framework. MTCache is a transparent Middle tier cache. It is a prototype database caching server build on MS SQL Server. I will briefly explain how it works. The box of the dashed line represents the caching DBMS. It is connected to a backend server. The shadow databases is a copy of the catalog information from the backend server. This is to enable the local optimizer to generate distributed plans. When queries come in, the optimizer generate the best plan totally based on the cost. And either local data, or remote data, or both may be used to produce query results. We extend this framework to enforce C&C constraints in the following way. First, we add some meta data in the caching DBMS. Then we extend the optimizer to do C&C checking Thus, when a query with C&C requirements is submitted, we guarantee to generate results that satisfy such requirements. Caching DBMS Backend DBMS Results Results Extension to MTCache Framework MTCache Framework [LGZ04]
15
Extension to MTCache Framework
Queries with Relaxed C&C Requirements Shadow Databases Query Optimizer Currency Region Metadata Local Materialized Views Execution Engine Heartbeat Tables Ok, let’s first look at the currency region metatdata and heartbeat tables we add to the cache. Caching DBMS Backend DBMS Results Results Extension to MTCache Framework
16
C&C Tracking Mechanism
Consistency tracking currency region (CR) The unit of update propagation Data mutually consistent all the time Properties, e.g., est. delay, est. interval Currency tracking heartbeat table V 1 V 3 V 4 V 5 V2 V 1 V 3 V2 Suppose we have 5 views v1 to v5 in the cache. In order to decide whether they are good enough for a query, we need to keep track of their C&C status. How to tack mutual consistency? Well, the cache admin need to decide a partitioning for the cache. In this example, the left three views are in the same partition, and so do the right two views. Each partition is called a currency region. We require that Currency region is the unit…. And thus all the views within a CR is mutually consistent. This provides a simple way for consistency checking. Each currency region also has some characteristics, for example …. We will explain later how they are used. Now we know which views are mutually consistent. Next question, how to bound the currency of a CR. Our answer is heartbeat table mechanism. At the backend we have a global heartbeat table. It has two columns: currency region id and timestamp. It has one row for each CR. Each currency region keeps a local copy of its associated row. For each row in the backend, a stored procedure updates the timestamp to current time periodically. In this example, the timestamp for cr 1 is set every 1o mins. Then when updates are propagated to the currency region, this new timestamp is also copied to the cache. This provide a way to bound the currency of local data. Suppose now it is 1:00, then we know the data in CR1 is no more than 30 min old. Backend V 4 Cache V 5 Cid Timestamp 1 2 12: 00 CR1: 1 12: 10 12: 30 12: 00 12: 30 12: 20 12: 00 2 12: 00 CR2: 30-Nov-18 Hongfei Guo Univ. of Wisconsin
17
Extension to MTCache Framework
Queries with Relaxed C&C Requirements Queries with Relaxed C&C Requirements Shadow Databases Query Optimizer Currency Region Metadata The best plan that: Satisfies consistency requirements Includes run-time currency checking Local Materialized Views Execution Engine Heartbeat Tables Caching DBMS Backend DBMS Results Results Extension to MTCache Framework
18
Extension to the Optimizer
Compile-time consistency checking Run-time currency checking Cost estimation 30-Nov-18 Hongfei Guo Univ. of Wisconsin
19
Hongfei Guo Univ. of Wisconsin
Consistency Checking Enforced at optimization time Immediately prune a sub-plan if it violates consistency constraints Merge join Q1: σ( Books Reviews) CURRENCY 5 ON (Books, Reviews) Animation with color. Use the example from the poster. Emphasize reciews & books are not from the same region Local scan Reviews Remote query on Books 30-Nov-18 Hongfei Guo Univ. of Wisconsin
20
Run-time Currency Checking
When view V matches expression E E V Explain view matching. Explain SWU. In out special case, what is currency guard. Note our currency guard is at view level instead of query level. One advantage of this is that if a plan involves more than one currency guards, say 2. Then during execution, it is possible that for one SWU, local data is used, but for the other, remote data is used. Currency guard: Check if local view V satisfies currency requirement 30-Nov-18 Hongfei Guo Univ. of Wisconsin
21
Hongfei Guo Univ. of Wisconsin
Cost Estimation Cost for the SwitchUnion operator: C = p * Clocal + (1- p) * Cremote + Ccg p : probability that the local branch will be used Clocal : cost of execution the local branch Cremote : cost of execution the remote branch Ccg : cost of currency checking 30-Nov-18 Hongfei Guo Univ. of Wisconsin
22
Hongfei Guo Univ. of Wisconsin
Estimating p Compute p from two variables: f : estimated refresh interval d : estimated minimal delay p = if B-d ≤ 0 p = (B-d)/f if 0 < B-d ≤ f p = if B-d > f Given: d : delay f : refresh interval B : currency bound 30-Nov-18 Hongfei Guo Univ. of Wisconsin
23
Hongfei Guo Univ. of Wisconsin
Roadmap Background Expressing C&C constraints Enforcing C&C constraints Experiments and analysis Conclusion & future work So far, I have described how we enforce C&C constraints in a caching DBMS. Compared to a normal plan, in order to guarantee currency requirements, our plan has to include run-time currency checking. How much overhead does this incur? We run experiments to answer this question. 30-Nov-18 Hongfei Guo Univ. of Wisconsin
24
Hongfei Guo Univ. of Wisconsin
Experimental Setting Back-end hosts a TPCD database tpcd1gh with scale factor 1.0 (~1GB) Cache server has a shadow of tpcd1gh Two local views: cust_prj, order_prj Currency region setting: 1 GB TPCD. Local data we have two simple projection views. Take out the region information. Doesn’t use it here. cid interval delay views CR1 1 15 5 cust_prj CR2 2 10 orders_prj 30-Nov-18 Hongfei Guo Univ. of Wisconsin
25
Hongfei Guo Univ. of Wisconsin
Queries Used Characterize the queries. Simplest query, index lookup on a primary key returns a single row. (fast possible query) Simple fast index nested loop join query, returns 6 rows. Lookup query on a non-key column, returns about 6000 rows. What plans? For Q1 and Q3, local view of Customer is used with currency guard; for Q2, both local views are used, thus it has two currency guard. Query 1 and query 2 are designed to get worst case overhead since those queries are short. 30-Nov-18 Hongfei Guo Univ. of Wisconsin
26
Overhead of Currency Guards
Local Remote Q1 Q2 Q3 cost (ms) 0.11 0.19 2.39 0.24 0.42 0.90 cost (%) 15.25 21.30 3.66 3.59 4.31 0.41 # Rows 1 6 5975 How to get the number? How to read this table: Local case: Remote case. Measure the overhead. We measure the absolute overhead and relative overhead. Q1 & Q2: absolute value is small, but a significant percentage, because the query itself is small. Q3, percentage is small, but the absolute value is high. Why? (For remote case, too much variance. Appr. S level. Too much noise.) overhead is small. We want to investigate it furthur. 30-Nov-18 Hongfei Guo Univ. of Wisconsin
27
Overhead of Currency Guards
Local Remote Q1 Q2 Q3 cost (ms) 0.11 0.19 2.39 0.24 0.42 0.90 cost (%) 15.25 21.30 3.66 3.59 4.31 0.41 # Rows 1 6 5975 30-Nov-18 Hongfei Guo Univ. of Wisconsin
28
Hongfei Guo Univ. of Wisconsin
Overhead Breakdown Setup Run Shutdown Total ~IdealTotal ms % Q1 0.04 0.06 0.01 0.11 15.25 0.07 11.51 Q2 0.09 0.19 21.30 0.10 14.32 Q3 1.99 2.39 3.66 0.16 Inherent Cost of CG = crun + cshutdown 30-Nov-18 Hongfei Guo Univ. of Wisconsin
29
Hongfei Guo Univ. of Wisconsin
Overhead Breakdown Setup Run Shutdown Total ~IdealTotal ms % Q1 0.04 0.06 0.01 0.11 15.25 0.07 11.51 Q2 0.09 0.19 21.30 0.10 14.32 Q3 1.99 2.39 3.66 0.16 Process a query includes three phases. Setup: instantiate the plan (allocate local memories, check tables, schema lock (cannot delete), resource binding etc. Run: actual execution (acquire locks) Shutdown: cleanup, free the resources including locks. Regardless how it is implemented, a currency guard has to be evaluated once only once, that is the inherent cost of the currency guard. Q1 and Q2: comparison. Close Q3: way off. We tested on Beta code, the implementations are not fully optimized. (The reason is implementation of swu not efficient. The cost is proportional to the number of rows.) Inherent Cost of CG = crun + cshutdown 30-Nov-18 Hongfei Guo Univ. of Wisconsin
30
Hongfei Guo Univ. of Wisconsin
Roadmap Background Expressing C&C constraints Enforcing C&C constraints Experiments and analysis Conclusion & future work 30-Nov-18 Hongfei Guo Univ. of Wisconsin
31
Hongfei Guo Univ. of Wisconsin
Conclusion Goal: provide query results with quality guarantees Allow queries with explicit C&C constraints Enforce C&C constraints in SQL Server (MTCache framework) To conclude, in replica environment, we have the problem of data quality to the application need. Our goal is to… To achieve that, we extend SQL to allow a query to specify its relaxed C&C requirements, and integrated C&C checking into SQL Server. Now given a query with C&C requirements, the caching DBMS guarantees that the query results satisfy those requirements. 30-Nov-18 Hongfei Guo Univ. of Wisconsin
32
Hongfei Guo Univ. of Wisconsin
Future Work Improve current prototype timeline constraints Finer granularity C&C constraints C&C-aware cache management Does additional knowledge (C&C reqts.) buy us something? We envision two lines of future research. So far given a set of cached data, our techniques guarantee that the query results satisfied C&C requirements. We didn’t touch upon cache management problem. That is: what to cache & how to maintain the cache. Many researches have addressed this problem. But now we know more about the workload. With this additional information of C&C requirements, can we do better? 30-Nov-18 Hongfei Guo Univ. of Wisconsin
33
Hongfei Guo Univ. of Wisconsin
Thank You ! Questions ? 30-Nov-18 Hongfei Guo Univ. of Wisconsin
34
Hongfei Guo Univ. of Wisconsin
Workload Shifting Local Workload (%) (a) With relaxed currency bound (b) With increased refresh interval 30-Nov-18 Hongfei Guo Univ. of Wisconsin
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.