Download presentation
Presentation is loading. Please wait.
1
1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional data sets & approaches Graphs (e.g., road networks) Immersidata (e.g., haptic) User profiles & aggregation/clustering
2
2 ISI’02 Storing multidimensional data (matrix vs. relations) Indexing multidimensional data (R-tree) Queries Search for similar objects (similarity search ) [ICDE ’ 00,ICME ’ 00] Spatial and temporal queries [ IDEAS ’ 00,ACM-GIS ’ 01,KAIS ’ 02] Multidimensional data mining Aggregation [EDBT ’ 02,PODS ’ 02] Clustering [ACM-MMj ’ 02] Classification [INFORMS ’ 02] Finding outliers [SSDBM ’ 01] Challenges
3
3 ISI’02 f (S1) e.g., avg e.g., std Stock Prices S1 Sn day $price 1365 day $price 1365 A point in 365 dimensions (computationally complex) f (Sn) A point in 2 dimensions (not accurate enough) 33 11 22 44 55 g (Sn) g (S1) A point in 5 dimensions transformation-based: FFT, Wavelet [SSDBM’00, 01]
4
4 ISI’02 More Similarity Search & Clustering 0 255... More accurate Images Red Green Blue 208 125 100 Color Histograms R G B Red Green Blue 80 100 210 C Angle Sequences = [ ] Shapes [ICDE’99 … ICME’00] Web Navigations (Hit) Feature Vectors [RIDE’97 … WebKDD’01] P1 P2 P3 P4 P5 … 3 870
5
5 ISI’02 On-Line Analytical Processing (OLAP) Multidimensional data sets: Dimension attributes (e.g., Store, Product, Date) Measure attributes (e.g., Sale, Price) Range-sum queries Average sale of shoes in CA in 2001 Number of jackets sold in Seattle in Sep. 2001 Tougher queries: Covariance of sale and price of jackets in CA in 2001 (correlation) Variance of price of jackets in 2001 in Seattle Store Location Product DateSale LA Shoes Jan. 01 $21,500 $85.99 NY Jacket June 01 $28,700 $45.99 Price.............................. Market-Relation (p=shoe) (s CA) (d 2001) Avg (sale) Too Slow!
6
6 ISI’02 Example Solution (Pre-computation): Prefix-sum [Agrawal et. al 1997] Age Salary 25$50k 28$55k 30$58k 50$100k 55$130k 57 $120k $40k $55k $65k $100k$120k $150k 0 25 40 50 60 80 Salary Age Query: Sum(salary) when (25 < age < 40) and (55k < salary < 150k) Issues: Measure attribute should be pre-selected Aggregation function should be pre-selected (sum or count) Updates are expensive (need re-computation) Result: I – II – III + IV Query: Sum(salary) when (25 < age < 40) and (55k < salary < 150k)
7
7 ISI’02 Spatial & Temporal Data Complex Queries Data types: A point: or A line-segment: A line: sequence of line-segments A region: A closed set of lines Moving point: (e.g., car, train, …) Changing region: (e.g., changing temperature of a county) Queries: Rivers Countries Hospitals Cities Taxi 5km of Home 10 min Experiments BrainR [Visual’99] [ACM-GIS’01, VLDB’01]
8
8 ISI’02 Spatial & Temporal Data & Queries Data types: A point: or A line-segment: A line: sequence of line-segments A region: A closed set of lines Moving point: (e.g., objects, car, train, … ) Queries: Molecules Microbes Train-stations Cities Round objects 5cm of Hand 10 s Number of distractions in of subject Station
9
9 ISI’02 Spatial & Temporal Data & Queries … K Nearest Neighbor queries: find the k nearest objects to a query point (5 closest hospitals to my car) u What is nearest? In road network (or a graph) is “shortest path” which is complex to compute in real- time for all points of interests A B C 2-D Space u Approach: embed graph into high dimensional space where computationally simple Minkowski metrics (e.g., Euclidean) can approximate real distances [ACM-GIS’02?] A B C Embedding Techniques (e.g., Lipschitz) n-D Space
10
10 ISI’02 Immersidata and Mining Queries [CIKM’01, UACHI’01]
11
11 ISI’02 … … Immersidata and Mining Queries … A dynamic sign, e.g., ASL colors
12
12 ISI’02 Fuzzy Aggregation Fuzzy Aggregation Clusters User Profiles & Clustering Offline Processes PPED Similarity Measure and Clustering PPED Similarity Measure and Clustering User Profiles User 1 User 2 User 3 User 4 User 5 User U-6 User U-5 User U-4 User U-3 User U-2 User U-1 User U User 6 Voting Favorite Features (Rock= High Classical= Low Pop= Low Rap= High) Item Database Cluster Wish-list 0.87 0.83 0.72 0.47 0.61
13
13 ISI’02 PPED Similarity Measure PPED Similarity Measure Fuzzy Aggregation Clusters User Profiles & Clustering Online Processes Current User’s Profile A List of Similarity Values 0.65 0.79 0.32 User Wish-List 0.87 0.83 0.82 0.79 0.72 0.70 0.68 0.65 0.63 0.61 0.54 0.47 0.42 Cluster Wish-lists 0.87 0.83 0.72 0.47 0.61 0.87 0.83 0.72 0.47 0.61 0.87 0.83 0.72 0.47 0.61
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.