Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional.

Similar presentations


Presentation on theme: "1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional."— Presentation transcript:

1 1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional data sets & approaches  Graphs (e.g., road networks)  Immersidata (e.g., haptic)  User profiles & aggregation/clustering

2 2 ISI’02 Storing multidimensional data (matrix vs. relations) Indexing multidimensional data (R-tree) Queries  Search for similar objects (similarity search ) [ICDE ’ 00,ICME ’ 00]  Spatial and temporal queries [ IDEAS ’ 00,ACM-GIS ’ 01,KAIS ’ 02] Multidimensional data mining  Aggregation [EDBT ’ 02,PODS ’ 02]  Clustering [ACM-MMj ’ 02]  Classification [INFORMS ’ 02]  Finding outliers [SSDBM ’ 01] Challenges

3 3 ISI’02 f (S1) e.g., avg e.g., std Stock Prices S1 Sn day $price 1365 day $price 1365 A point in 365 dimensions (computationally complex) f (Sn) A point in 2 dimensions (not accurate enough) 33 11 22 44 55 g (Sn) g (S1) A point in 5 dimensions transformation-based: FFT, Wavelet [SSDBM’00, 01]

4 4 ISI’02 More Similarity Search & Clustering 0 255... More accurate Images Red Green Blue 208 125 100 Color Histograms R G B Red Green Blue 80 100 210 C Angle Sequences = [  ]          Shapes [ICDE’99 … ICME’00] Web Navigations (Hit) Feature Vectors [RIDE’97 … WebKDD’01] P1 P2 P3 P4 P5 … 3 870

5 5 ISI’02 On-Line Analytical Processing (OLAP) Multidimensional data sets:  Dimension attributes (e.g., Store, Product, Date)  Measure attributes (e.g., Sale, Price) Range-sum queries  Average sale of shoes in CA in 2001  Number of jackets sold in Seattle in Sep. 2001 Tougher queries:  Covariance of sale and price of jackets in CA in 2001 (correlation)  Variance of price of jackets in 2001 in Seattle Store Location Product DateSale LA Shoes Jan. 01 $21,500 $85.99 NY Jacket June 01 $28,700 $45.99 Price.............................. Market-Relation  (p=shoe)  (s CA)  (d 2001) Avg (sale) Too Slow!

6 6 ISI’02 Example Solution (Pre-computation): Prefix-sum [Agrawal et. al 1997] Age Salary 25$50k 28$55k 30$58k 50$100k 55$130k 57 $120k $40k $55k $65k $100k$120k $150k 0 25 40 50 60 80 Salary Age Query: Sum(salary) when (25 < age < 40) and (55k < salary < 150k) Issues: Measure attribute should be pre-selected Aggregation function should be pre-selected (sum or count) Updates are expensive (need re-computation) Result: I – II – III + IV Query: Sum(salary) when (25 < age < 40) and (55k < salary < 150k)

7 7 ISI’02 Spatial & Temporal Data Complex Queries Data types: A point: or A line-segment: A line: sequence of line-segments A region: A closed set of lines Moving point: (e.g., car, train, …) Changing region: (e.g., changing temperature of a county) Queries: Rivers Countries Hospitals Cities Taxi 5km of Home 10 min Experiments BrainR [Visual’99] [ACM-GIS’01, VLDB’01]

8 8 ISI’02 Spatial & Temporal Data & Queries Data types: A point: or A line-segment: A line: sequence of line-segments A region: A closed set of lines Moving point: (e.g., objects, car, train, … ) Queries: Molecules Microbes Train-stations Cities Round objects 5cm of Hand 10 s Number of distractions in of subject Station

9 9 ISI’02 Spatial & Temporal Data & Queries … K Nearest Neighbor queries: find the k nearest objects to a query point (5 closest hospitals to my car) u What is nearest? In road network (or a graph) is “shortest path” which is complex to compute in real- time for all points of interests A B C 2-D Space u Approach: embed graph into high dimensional space where computationally simple Minkowski metrics (e.g., Euclidean) can approximate real distances [ACM-GIS’02?] A B C Embedding Techniques (e.g., Lipschitz) n-D Space

10 10 ISI’02 Immersidata and Mining Queries [CIKM’01, UACHI’01]

11 11 ISI’02 … … Immersidata and Mining Queries … A dynamic sign, e.g., ASL colors 

12 12 ISI’02 Fuzzy Aggregation Fuzzy Aggregation Clusters User Profiles & Clustering Offline Processes PPED Similarity Measure and Clustering PPED Similarity Measure and Clustering User Profiles User 1 User 2 User 3 User 4 User 5 User U-6 User U-5 User U-4 User U-3 User U-2 User U-1 User U User 6 Voting Favorite Features (Rock= High Classical= Low Pop= Low Rap= High) Item Database Cluster Wish-list 0.87 0.83 0.72 0.47 0.61

13 13 ISI’02 PPED Similarity Measure PPED Similarity Measure Fuzzy Aggregation Clusters User Profiles & Clustering Online Processes Current User’s Profile A List of Similarity Values 0.65 0.79 0.32 User Wish-List 0.87 0.83 0.82 0.79 0.72 0.70 0.68 0.65 0.63 0.61 0.54 0.47 0.42 Cluster Wish-lists 0.87 0.83 0.72 0.47 0.61 0.87 0.83 0.72 0.47 0.61 0.87 0.83 0.72 0.47 0.61


Download ppt "1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional."

Similar presentations


Ads by Google