Presentation is loading. Please wait.

Presentation is loading. Please wait.

V Storage Manager Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Similar presentations


Presentation on theme: "V Storage Manager Shahram Ghandeharizadeh Computer Science Department University of Southern California."— Presentation transcript:

1 V Storage Manager Shahram Ghandeharizadeh Computer Science Department University of Southern California

2 Simple Tests Arbitrary Arbitrary Designed to stress your software and ensure its robustness: Designed to stress your software and ensure its robustness:  Repeated creation of the same datazone name.  Deletion of records in iteration j when they were deleted in iteration (j-1)  Etc. Designed to be scalable. Designed to be scalable.  Your test can be implemented with a variable A that captures the number of records in alpha (10,000).  Value of all other important parameters can be a function of A, e.g., number of records for beta, number of iterations in Test 3.  Start with a small value for A, say 100. Once your tests are working, increase the value of A to 1,000. Once this works, increase A to 10,000.  Do not be surprised to find your code break when you increase the value of A from 100 to 1,000. This is the reality of developing robust software: Size matters!

3 Suggestion Focus on a single-threaded version of your implementation. Focus on a single-threaded version of your implementation. Once all tests are running, extend to analyze the impact of multi-threading. Once all tests are running, extend to analyze the impact of multi-threading.  This may require a re-visit of your designs.

4 Heap versus Stack Execution of your program consists of two kinds of dynamic memory: Heap and Stack. Execution of your program consists of two kinds of dynamic memory: Heap and Stack. Use of malloc and new allocates memory from Heap. Use of malloc and new allocates memory from Heap. The programmer is responsible to free this memory and return it to heap. The programmer is responsible to free this memory and return it to heap. Invocation of a method uses a stack. All variables declared in a method are placed on the stack. When the method returns, its stack is freed. Invocation of a method uses a stack. All variables declared in a method are placed on the stack. When the method returns, its stack is freed. heap Code static data stack

5 Heap versus Stack (Example) In method Test 1, the character array named “payload” is declared on the stack when Test 1 is invoked. In method Test 1, the character array named “payload” is declared on the stack when Test 1 is invoked. Its memory is freed when Test 1 completes execution. Its memory is freed when Test 1 completes execution. The programmer is NOT responsible fore managing the memory assigned to payload because it is a local variable managed using the stack. The programmer is NOT responsible fore managing the memory assigned to payload because it is a local variable managed using the stack. Test 1 () { char[10000] payload; vdt vptr; vptr.set_data(payload);….} heap Code static data stack

6 Heap versus Stack (Example) In method Test 1, the character array named “payload” is assigned memory from the heap (using new). In method Test 1, the character array named “payload” is assigned memory from the heap (using new).  The variable payload is on the stack!  The memory pointed to by “payload” is allocated from the heap. The programmer is responsible for freeing this memory using delete. The programmer is responsible for freeing this memory using delete. Test 1 () { char *payload; vdt vptr; payload = new char[1000]; vptr.set_data(payload);….} heap Code static data stack

7 Urban Legends about Heap The following is FALSE: Memory allocated in method X can be freed only in method X. See the example as proof. The following is FALSE: Memory allocated in method X can be freed only in method X. See the example as proof. Cause: Debugging C/C++ programs is difficult. It is easy to corrupt memory if you are not careful! These errors are difficult to find. They are also stressful, resulting in beliefs that are not true  Urban legend is born. Cause: Debugging C/C++ programs is difficult. It is easy to corrupt memory if you are not careful! These errors are difficult to find. They are also stressful, resulting in beliefs that are not true  Urban legend is born. How to avoid these kinds of conceptual traps? Write small programs to verify a belief that sounds too good to be true. It is simple and avoids digressions that waste your time and cause a lot of heart ache. How to avoid these kinds of conceptual traps? Write small programs to verify a belief that sounds too good to be true. It is simple and avoids digressions that waste your time and cause a lot of heart ache. GenMem::GenMem(Vdt *v) { char *cptr; cptr = new char[10]; v->set_data(cptr); memcpy(cptr, "Shahram", 7); } int _tmain(int argc, _TCHAR* argv[]) { Vdt vptr; char *cptr; GenMem *GM = new GenMem(&vptr); cptr = (char *) vptr.get_data(); delete cptr; printf ("Exiting this simple test."); return 0; }

8 Variant Indexes by P. O’Neil and D. Quass Shahram Ghandeharizadeh Computer Science Department University of Southern California

9 Key Assumptions A read-mostly database that is updated infrequently. A read-mostly database that is updated infrequently. Complex indexes to speedup queries. Complex indexes to speedup queries. Focuses on physical designs to enhance performance. Focuses on physical designs to enhance performance.

10 Example Data Warehouse McDonalds keeping track of different sandwich purchases. McDonalds keeping track of different sandwich purchases. CidPidDayAmtdollar_costUnit_sales SALES PidNameSizeWeightPackage_type PROD DayWeekMonthYearHollidayWeekday TIME

11 Example Data Warehouse Key Observations: Key Observations:  A handful of products, a PROD table with tens of rows.  Many millions of rows for SALES tables. CidPidDayAmtdollar_costUnit_sales SALES PidNameSizeWeightPackage_type PROD DayWeekMonthYearHollidayWeekday TIME

12 A B+-Tree On the Pid of Sales Assuming McDonald’s sales 12 different products Assuming McDonald’s sales 12 different products Joe, Big Mac, Lab day, … Mary, Fries, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Kam, Happy Meal, Pres day, … Bob, Big Mac, Pres day, … (Big Mac, (1,1), (1, 3), (1, 4), (2,4), …. B+-tree Leaf page

13 A B+-Tree On the Pid of Sales Assuming McDonald’s sales 12 different products Assuming McDonald’s sales 12 different products Joe, Big Mac, Lab day, … Mary, Fries, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Kam, Happy Meal, Pres day, … Bob, Big Mac, Pres day, … (Big Mac, (1,1), (1, 3), (1, 4), (2,4), …. B+-tree Leaf page What happens with a SALES table consisting of a million rows?

14 A B+-Tree on Major Holidays A B+-tree index on different holidays of the SALES table. A B+-tree index on different holidays of the SALES table. Joe, Big Mac, Lab day, … Mary, Fries, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Kam, Happy Meal, Pres day, … Bob, Big Mac, Pres day, … (Pres day, (1,2), (1, 3), (1, 4), (2,1), …. B+-tree Leaf page

15 A B+-Tree on Major Holidays A B+-tree index on different holidays of the SALES table. A B+-tree index on different holidays of the SALES table. Joe, Big Mac, Lab day, … Mary, Fries, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Kam, Happy Meal, Pres day, … Bob, Big Mac, Pres day, … (Pres day, (1,2), (1, 3), (1, 4), (2,1), …. B+-tree Leaf page Value List

16 A B+-Tree on Major Holidays A B+-tree index on different holidays of the SALES table. A B+-tree index on different holidays of the SALES table. Joe, Big Mac, Lab day, … Mary, Fries, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Kam, Happy Meal, Pres day, … Bob, Big Mac, Pres day, … (Pres day, (1,2), (1, 3), (1, 4), (2,1), …. B+-tree Leaf page Value List RID List

17 Conjunctive Queries Count number of Big Mac Sales on “President’s Day” assuming a B+-tree on product (pid) and day of SALES Count number of Big Mac Sales on “President’s Day” assuming a B+-tree on product (pid) and day of SALES With RID-Lists With RID-Lists  Get the Value-List for “Big Mac” using the B+- tree, obtain RID-List1.  Get the Value-List for “President’s Day” using the B+-tree, obtain RID-List2.  Compute set-intersect of RID-List1 and RID-List2  Count the number of RIDs in the intersection set. Is there a better way? Is there a better way?  Yes, use bit-maps and logical bit-wise operands.

18 Bitmap Indexes Use a bitmap to represent the existence of a record with a certain attribute value. Use a bitmap to represent the existence of a record with a certain attribute value. Example: If a record has the indexed attribute value “Big Mac” then its corresponding entry in the bitmap is set to one. Otherwise, it is a zero. Example: If a record has the indexed attribute value “Big Mac” then its corresponding entry in the bitmap is set to one. Otherwise, it is a zero.

19 A Bitmap A Bitmap B is defined on T as a sequence of M bits. A Bitmap B is defined on T as a sequence of M bits. For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Pres Day, ….. Record 0

20 A Bitmap A Bitmap B is defined on T as a sequence of M bits. A Bitmap B is defined on T as a sequence of M bits. For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Pres Day, ….. Record 1

21 A Bitmap A Bitmap B is defined on T as a sequence of M bits. A Bitmap B is defined on T as a sequence of M bits. For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Pres Day, ….. Record 2

22 A B+-Tree on Major Holidays A B+-tree index on different holidays of the SALES table. A B+-tree index on different holidays of the SALES table. Joe, Big Mac, Lab day, … Mary, Fries, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Kam, Happy Meal, Pres day, … Bob, Big Mac, Pres day, … (Pres day, …. B+-tree Leaf page

23 Logical Bit-Wise Operations Three key operands: AND, OR, NOT Three key operands: AND, OR, NOT Assume a bit map consisting of 4 bits: Assume a bit map consisting of 4 bits:  0011 AND 0101 = 0001  0011 OR 0101 = 0111  NOT 0011 = 1100 This paper assumes bit maps consisting of millions, if not billions, of bits. In Example 3.1, they assume a bitmap consisting of 100,000,000 bits, 12.5 Mega bytes. This paper assumes bit maps consisting of millions, if not billions, of bits. In Example 3.1, they assume a bitmap consisting of 100,000,000 bits, 12.5 Mega bytes.  A large bit map is stored in a sequence of disk pages. Each disk page full of bits is termed a fragment. Some bit positions may correspond to non-existent rows. An Existence Bitmap (EBM) has exactly those 1 bits corresponding to existing rows. Some bit positions may correspond to non-existent rows. An Existence Bitmap (EBM) has exactly those 1 bits corresponding to existing rows.

24 Summary ANY QUESTIONS?

25 Range Predicate SELECT target-list SELECT target-list FROM T FROM T WHERE C-range WHERE C-range C-range = {C > c1, C >= c1, C = c1, C c1, C >= c1, C = c1, C <= c1, C < C1, C between c1 and c2} How to process with a bit-map index? How to process with a bit-map index?

26 Range Predicate SELECT target-list SELECT target-list FROM T FROM T WHERE C-range WHERE C-range C-range = {C > c1, C >= c1, C = c1, C c1, C >= c1, C = c1, C <= c1, C < C1, C between c1 and c2} How to process with a bit-map index? How to process with a bit-map index?

27 Range Predicate SELECT target-list SELECT target-list FROM T FROM T WHERE C-range WHERE C-range C-range = {C > c1, C >= c1, C = c1, C c1, C >= c1, C = c1, C <= c1, C < C1, C between c1 and c2} How to process with a bit-map index? How to process with a bit-map index?

28 Conjunctive Queries Count number of Big Mac Sales on “President’s Day” assuming a B+-tree on product (pid) and day of SALES Count number of Big Mac Sales on “President’s Day” assuming a B+-tree on product (pid) and day of SALES With RID With RID  Get the Value-List for “Big Mac” using the B+-tree, obtain RID-List1.  Get the Value-List for “President’s Day” using the B+-tree, obtain RID-List2.  Compute set-intersect of RID-List1 and RID-List2  Count the number of RIDs in the intersection set. With bit maps With bit maps  Get the Value-List for “Big Mac” using the B+-tree, obtain bit-map1.  Get the Value-List for “President’s Day” using the B+-tree, obtain bit-map2.  Recall Existence Bitmap (EBM) identify rows that exist.  Let RES = logical AND of bit-map1, bit-map2, and EBM.  Count the number of bits set to one to identify how many Big Macs were sold on “President’s Day”.

29 Example 2.1

30 Projection Index Reminiscent of vertical partitioning. Reminiscent of vertical partitioning. Once the qualifying records are found, the projection index enables the system to find the amt attribute value of the record with a few disk I/Os. Once the qualifying records are found, the projection index enables the system to find the amt attribute value of the record with a few disk I/Os. cidpidholliday amt Labor day Presidents day Labor day ….. amt … …

31 Projection Index (Definition) Page 41, first paragraph of Section 2.2 Page 41, first paragraph of Section 2.2

32 Projection Index (Example Usage) Page 41, middle of left hand column: Page 41, middle of left hand column:

33 Bit-Sliced Indexes: Motivation Assume the “Amt” values are in dollars and as follows: Assume the “Amt” values are in dollars and as follows:

34 Bit-Sliced Indexes: Motivation Assume the “Amt” values are in dollars and as follows. Their binary representation is: Assume the “Amt” values are in dollars and as follows. Their binary representation is:

35 Bit-Sliced Indexes: Motivation Now, number the order of records as before: Now, number the order of records as before:

36 Bit-Sliced Indexes: Motivation Construct a Bit-Sliced index: Construct a Bit-Sliced index: Bit 0, Bit 1, Bit 2,

37 Bit-Sliced Indexes: Motivation To compute the sum of all records using the existence bit-map bnn ( ): To compute the sum of all records using the existence bit-map bnn ( ): Bit 0, Bit 1, Bit 2, ?

38 Bit-Sliced Indexes: Motivation To compute the sum of all records using the existence bit-map bnn ( ): To compute the sum of all records using the existence bit-map bnn ( ): Bit 0, Bit 1, Bit 2, * (7 records with bit 0 set to 1) + 2 * (4 records with bit 1 set to 1) + 4 * (2 records with bit 2 set to 1)

39 Bit-Sliced Indexes: Motivation To compute the sum of all records using the existence bit-map bnn ( ): To compute the sum of all records using the existence bit-map bnn ( ): Bit 0, Bit 1, Bit 2, * (7 records with bit 0 set to 1) + 2 * (4 records with bit 1 set to 1) + 4 * (2 records with bit 2 set to 1) = (1 * 7) + (2 * 4) + (4 * 2) =23

40 Bit-Sliced Indexes: Definition Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define

41 Bit-Sliced Indexes: Definition Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define Why maintain Bn?

42 Bit-Sliced Indexes: Definition Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define The result of a scalar such as SUM involving a null will itself be a null. Example, see: ag/oracle/05-jul/o45sql.html

43 Bit-Sliced Index 20 Bitmaps for the “Amt” column represents quantities up to 2 20 – 1 pennies, $10, Bitmaps for the “Amt” column represents quantities up to 2 20 – 1 pennies, $10, If we assume normal sales range up to $100.00, and all values are as likely to occur, a Value-List index would have nearly 10,000 different values. A Bitmap representation would lose its effectiveness. However, Bit- sliced indexes continue to perform well. If we assume normal sales range up to $100.00, and all values are as likely to occur, a Value-List index would have nearly 10,000 different values. A Bitmap representation would lose its effectiveness. However, Bit- sliced indexes continue to perform well.

44 Example with Value-List Index Assume SALES table has 100 million rows. Each row is 200 bytes in length. Disk page is 4 Kbytes, holding 20 rows. Assume SALES table has 100 million rows. Each row is 200 bytes in length. Disk page is 4 Kbytes, holding 20 rows. Query: Query: SELECT SUM(AMT) FROM SALES WHERE condition Bitmap Bf = the Foundset Bitmap Bf = the Foundset Bitmap Bv for each value Bitmap Bv for each value Bnn = Existance bitmap Bnn = Existance bitmap

45 Example with Bit-Sliced Indexes Query: Query: SELECT SUM(AMT) FROM SALES WHERE condition Bitmap Bf = the Foundset Bitmap Bf = the Foundset Bitmap Bv for each value Bitmap Bv for each value Bnn = Existance bitmap Bnn = Existance bitmap 20 bits: 20 bits: Bit 0, … Bit 1, … … Bit 19, …

46 Other Aggregate Functions Ignore MEDIAN & Column-Product. Ignore MEDIAN & Column-Product. SELECT AGG(C) FROM T WHERE condition SELECT AGG(C) FROM T WHERE condition  AGG(C) is COUNT, SUM, AVG, MIN, MAX

47 Range Queries SELECT target-list FROM T WHERE C-range C-range = {C > c1, C >= c1, C = c1, C c1, C >= c1, C = c1, C <= c1, C < C1, C between c1 and c2}

48 Bit-Sliced Indexes Assume c1 = 3, {011} Assume c1 = 3, {011} Bit 0, Bit 1, Bit 2, BGT = BLT = BEQ = If bit 2 is on in constant c1 {} Else BGT = | ( & ) = BEQ = & ~( ) = & = Iteration 1 on Bit 2: BLT = BGT = BEQ =

49 Bit-Sliced Indexes Assume c1 = 3, {011} Assume c1 = 3, {011} Bit 0, Bit 1, Bit 2, BLT = BGT = BEQ = If bit 1 is on in constant c1 BLT = BLT | (BEQ & NOT(B1)) = | ( & ~( )) = | ( & ) = BEQ = BEQ & B1 = & = Iteration 2 on Bit 1: BLT = BGT = BEQ =

50 Bit-Sliced Indexes Assume c1 = 3, {011} Assume c1 = 3, {011} Bit 0, Bit 1, Bit 2, BLT = BGT = BEQ = If bit 0 is on in constant c1 BLT = BLT | (BEQ & NOT(B1)) = | ( & ~( )) = | ( & ) = BEQ = BEQ & B1 = & = Iteration 3 on Bit 0: BLT = BGT = BEQ =

51 Bit-Sliced Indexes & Range Queries Note that = are computed using BEQ, BLT and BGT Note that = are computed using BEQ, BLT and BGT

52 Range Queries

53 Variant Indexes You are not responsible for Section 5, OLAP style queries. You are not responsible for Section 5, OLAP style queries.


Download ppt "V Storage Manager Shahram Ghandeharizadeh Computer Science Department University of Southern California."

Similar presentations


Ads by Google