Presentation is loading. Please wait.

Presentation is loading. Please wait.

An open source DBMS for handheld devices

Similar presentations


Presentation on theme: "An open source DBMS for handheld devices"— Presentation transcript:

1 An open source DBMS for handheld devices
by Rajkumar Sen IIT Bombay Under the guidance of Prof. Krithi Ramamritham

2 An open source DBMS for handheld devices
Outline Introduction Storage Management Query Processing Other issues Performance Evaluation Conclusions 12/1/2019 An open source DBMS for handheld devices

3 An open source DBMS for handheld devices
Introduction A resource constrained device A small computer with limited resources e.g. Cellphones, Simputer, Palm devices etc. Data management is important Increasing number of applications They deal with a fair amount of data Complex queries involving joins and aggregates Atomicity and Durability for data consistency Ease of application development A device resident DBMS is needed 12/1/2019 An open source DBMS for handheld devices

4 An open source DBMS for handheld devices
Introduction Need for Synchronization Data from remote server downloaded on the device Updates at both places Common data needs to be synchronized Challenges Limited computing power and main memory Limited stable storage Resources are not uniform across devices Need a system that can do the best for every device 12/1/2019 An open source DBMS for handheld devices

5 An open source DBMS for handheld devices
Introduction Storage Management Reduce storage cost to a minimum Limited storage could preclude any additional index Data model should try to incorporate some index information Query Processing Memory limits the query processing capabilities Minimum memory algorithms in existing systems does not work well for complex joins and aggregates Need algorithms that create in-memory indices and save aggregate values Optimal memory allocation among operators 12/1/2019 An open source DBMS for handheld devices

6 An open source DBMS for handheld devices
Storage Management Aim at compactness in representation of data Existing storage models Flat Storage Tuples are stored sequentially. Ensures access locality but consumes space. Pointer-based Domain Storage Values partitioned into domains which are sets of unique values Tuples reference the attribute value by means of pointers One domain shared among multiple attributes In Domain Storage, pointer of size p (typically 4 bytes) to point to the domain value. Can we further reduce the storage cost? 12/1/2019 An open source DBMS for handheld devices

7 An open source DBMS for handheld devices
Storage Management ID Storage: An identifier for each of the domain values Identifier is the ordinal value in the domain table Store the identifier instead of the pointer Use the identifier as an offset into the domain table Extendable IDs, length of the identifier grows and shrinks depending on the number of domain values 12/1/2019 An open source DBMS for handheld devices

8 An open source DBMS for handheld devices
Storage Management D domain values can be distinguished by identifiers of length log2D /8 bytes. Starting with 1 byte identifiers, the length grows and shrinks. ID values are projected out from the rest of the relation and stored separately maintaining Positional Indexing. Why not bit identifiers? Storage is byte addressable. Packing bit identifiers in bytes increases the storage management complexity. 12/1/2019 An open source DBMS for handheld devices

9 An open source DBMS for handheld devices
Storage Management Relation R ID Values Domain Values 1 v0 2 v1 1 n vn n Positional Indexing Figure: ID Storage 12/1/2019 An open source DBMS for handheld devices

10 An open source DBMS for handheld devices
Storage Management Ping Pong Effect At the boundaries, there is reorganization of ID values when the identifier length changes Frequent insertions and deletions at the boundaries might result in a lot of reorganization Phenomena should be avoided No deletion of Domain values Domain structure means a future insertion might reference the deleted value Do not delete a domain value even it is not referenced Setting a threshold for deletion for domain values Delete only if number of deletions exceeds a threshold Increase the threshold when boundaries are being crossed 12/1/2019 An open source DBMS for handheld devices

11 Storage Management Primary Key-Foreign Key relationship
Primary key: A domain in itself IDs for primary key values Values present in child table are the corresponding primary key IDs Projected foreign key column forms a Join Index Child Table Relation S S.B ID Values Parent Table Relation R 1 v0 2 v1 1 n vn n Figure: Primary Key-Foreign Key Join Index 12/1/2019 An open source DBMS for handheld devices

12 An open source DBMS for handheld devices
Storage Management ID based Storage wins over Domain Storage when p > log2D /8 Relations in a small device do not have a very high cardinality Above condition true for most of the data. Advantages (i) Considerable saving in storage cost. (ii) Efficient join between parent table and child table 12/1/2019 An open source DBMS for handheld devices

13 An open source DBMS for handheld devices
Query Processing Considerations Minimize writes to secondary storage Efficient usage of limited main memory Read buffer not required Main memory as write buffer If read:write ratio very high, flash memory as write buffer Need for Left-deep Query Plan Reduce materialization, if absolutely necessary use main memory Bushy trees and right-deep trees are ruled out Left deep tree is most suited for pipelined evaluation Right operand in a left-deep tree is always a stored relation 12/1/2019 An open source DBMS for handheld devices

14 An open source DBMS for handheld devices
Query Processing Need for optimal memory allocation If nested loop algorithms are used for every operator, minimum amount of memory is needed to execute the plan Nested loop algorithms are inefficient Should memory usage be reduced to a minimum at the cost of performance? Different devices come with different memory sizes Query plans should make efficient use of memory Memory must be optimally allocated among all operators Need to generate the best query execution plan depending on the available memory 12/1/2019 An open source DBMS for handheld devices

15 An open source DBMS for handheld devices
Query Processing Operator evaluation schemes Different schemes for an operator All have different memory usage and cost Schemes conform to left-deep tree query plan Cost of a scheme is the computation time 12/1/2019 An open source DBMS for handheld devices

16 An open source DBMS for handheld devices
Query Processing Schemes for Join Nested Loop Join Indexed Nested Loop Join Hash Join Using Join Index Schemes for aggregation Nested Loop aggregation Buffered aggregation Operator schemes implemented using the Iterator Model 12/1/2019 An open source DBMS for handheld devices

17 An open source DBMS for handheld devices
Query Processing Benefit/Size of a scheme Every scheme is characterized by a benefit/size ratio which represents its benefit per unit memory allocation Minimum scheme for an operator is the scheme that has max. cost and min. memory Assume n schemes s1, s2,…sn to implement an operator o min(o)=smin i, 1≤i≤n : Cost(si) ≤ Cost(smin) , Memory(si) ≥ Memory(smin) smin is the minimum scheme for operator o. Then, Benefit(si)=Cost(smin) – Cost(si) Size(si) =Memory(si) – Memory(smin) A 12/1/2019 An open source DBMS for handheld devices

18 Query Processing Every operator is a collection of (size,benefit) points, n points for n schemes Operator cost function is the collection of (cost, memory) points of its schemes (0,c1) (s2,b2) (m2,c2) Benefit Cost (s1,b1) (m3,c3) (0,0) (0,0) Size Memory Figure: (Size, Benefit) points for an operator Figure: Operator cost function 12/1/2019 An open source DBMS for handheld devices

19 An open source DBMS for handheld devices
Query Processing Optimal Memory Allocation 2-Phase Approach Phase 1: Query is first optimized to get a query plan Phase 2: Division of memory among the operators Scheme for every operator is determined in phase 1 and remains unchanged after phase 2, memory allocation in phase 2 on the basis of the cost functions of the schemes Memory is assumed to be available for all the schemes, this may not be true for a resource constrained device Traditional 2-phase optimization cannot be used 12/1/2019 An open source DBMS for handheld devices

20 An open source DBMS for handheld devices
Query Processing Optimal Memory Allocation 1-Phase Approach Query optimizer is made memory cognizant Modified optimizer takes into account division of memory among operators while choosing between plans Ideally, 1-phase optimization should be done but the optimizer becomes complex. 12/1/2019 An open source DBMS for handheld devices

21 An open source DBMS for handheld devices
Query Processing Modified 2-phase optimizer Optimal division of memory involves the decision of selecting the best scheme for every operator Phase 1: Determine the optimal left-deep join order using dynamic programming approach Phase 2: a) Divide memory among the operators b) Choose the scheme for every operator depending on the memory allocated 12/1/2019 An open source DBMS for handheld devices

22 An open source DBMS for handheld devices
Query Processing Exact memory allocation Hulgeri et al proposed an exact solution to the memory allocation problem Traditional 2-phase optimization Divides memory among operator schemes, schemes selected in phase 1 Algorithm to divide memory among linear piecewise cost functions Optimal division of memory takes place only at change-over points 12/1/2019 An open source DBMS for handheld devices

23 An open source DBMS for handheld devices
Query Processing Exact memory allocation Our operator cost functions are also piecewise linear functions Exact algorithm can be used by replacing scheme cost function with operator cost function Division of memory among operator cost functions Amount of memory allocated to each operator will exactly match one of its schemes 12/1/2019 An open source DBMS for handheld devices

24 An open source DBMS for handheld devices
Query Processing Heuristic memory allocation A heuristic to determine which operator gains the most per unit memory allocation and allocate memory to that operator Gain of every operator is determined by its best feasible scheme Repeat the process till memory allocation is done Heuristic: Select the scheme that has the maximum benefit/size ratio 12/1/2019 An open source DBMS for handheld devices

25 An open source DBMS for handheld devices
Query Processing MemAllocate(MTotal) { 1. Mmin = Σ Memory(min(i)) 2. for i=1 to m do 3. Scheme(i)=min(i) 4. Mavail = MTotal – Mmin 5. RemoveSchemes(Mavail) 6. sbest,obest=GetBestScheme(Mavail) 7. if no best scheme then return 8. else { 9. Mavail = Mavail - Memory(sbest) + Memory(Scheme(obest)) 10. Scheme(obest ) = sbest 11. RemoveSchemes(sbest,obest, Mavail ) 12. RecomputeBenefits(sbest,obest) 13. } 14. goto step 6 } m i=1 12/1/2019 An open source DBMS for handheld devices

26 Query Processing Recomputation of Benefits
Once the operator obest gets memory Memory(sbest), the benefit and size of all the schemes of obest that have higher memory than sbest change. New benefit and size values will be the difference between their old values and those of sbest. Benefit (s2,b2) (s1,b1) (b2-b1) (s2-s1) (0,0) Size Scheme 1 has highest benefit/size ratio Benefit(Scheme 2)=(b2-b1) Size(Scheme 2)=(s2-s1) Figure: Benefit and Size Recomputation 12/1/2019 An open source DBMS for handheld devices

27 An open source DBMS for handheld devices
Some other issues Data Synchronization Record the changes in a log Merge the changes with the main server Conflict detection and Conflict resolution Concurrency control Local transaction on the device, transaction doing data synchronization Minimum concurrency control needed Access Rights Management Community handhelds like Simputer More than a single user 12/1/2019 An open source DBMS for handheld devices

28 Implementation Status
Developed in C programming language Code base distributed over several subdirs Recursive makefiles to build the system Lex and Bison used to write the SQL parser Storage Manager, Query Optimizer and Query Executor implemented Supports CHAR, INTEGER AND FLOAT Select, Project, Join, and COUNT ID based Join Index and other aggregate operators not completed 12/1/2019 An open source DBMS for handheld devices

29 Performance Evaluation
Experimental setup Database system ported to the Simputer, a handheld device Sample healthcare schema and datasets Doctor (91), Drug(77), Visit(830), Prescription(2155) Q1: 3 joins and 2 selections Q4: 3 joins and aggregation over two attributes Data stored in Flat Storage and ID Storage without Join Index Exact and heuristic memory allocation Response time measured by varying the amount of memory 12/1/2019 An open source DBMS for handheld devices

30 Performance Evaluation
12/1/2019 An open source DBMS for handheld devices

31 Performance Evaluation
12/1/2019 An open source DBMS for handheld devices

32 Performance Evaluation
Conclusions Response times highest with minimum memory and least with maximum memory Computing power of the handheld affects the response time in a big way Heuristic memory allocation differed from exact algorithm in a few points only Response times more for ID Storage due to extra cost in projection Nested loop aggregation is very costly Join Index should reduce the query execution time 12/1/2019 An open source DBMS for handheld devices

33 An open source DBMS for handheld devices
Summary Storage Manager, Optimizer and Executor implemented Supports SPJ and COUNT operators Contributions A new storage model, ID based Storage Highlighted the need for optimal memory allocation Existing Exact allocation algorithm used with some modifications Heuristic memory allocation algorithm Selection of best query execution plan depending on memory available in a device 12/1/2019 An open source DBMS for handheld devices

34 Ongoing and Future work
Ongoing Work Data synchronization utility Remaining aggregate operators ID based Join Index Integration with AQUA, an online database backed discussion forum Future Work Feasibility of a 1-phase optimizer DBMS module toolkit An operator that returns first-k results of a query Application specific DBMS 12/1/2019 An open source DBMS for handheld devices

35 Thank You

36 Performance Evaluation
12/1/2019 An open source DBMS for handheld devices

37 Performance Evaluation
12/1/2019 An open source DBMS for handheld devices

38 Performance Evaluation
12/1/2019 An open source DBMS for handheld devices

39 Performance Evaluation
12/1/2019 An open source DBMS for handheld devices

40 Performance Evaluation
12/1/2019 An open source DBMS for handheld devices

41 Performance Evaluation
12/1/2019 An open source DBMS for handheld devices

42 Performance Evaluation
12/1/2019 An open source DBMS for handheld devices

43 Performance Evaluation
12/1/2019 An open source DBMS for handheld devices


Download ppt "An open source DBMS for handheld devices"

Similar presentations


Ads by Google