San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects Hans-Peter Kriegel, Stefan Brecheisen, Peer Kröger, Martin Pfeifle, Matthias Schubert ACM SIGMOD 2003 San Diego, California June 9-12, 2003 Database Group Institute for Computer Science University of Munich, Germany

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Vector Set Model new Outline of the Talk Evaluation Introduction Space Partitioning Models Data Partitioning Models Conclusion Introduction Space Partitioning Models Introduction Evaluation Conclusion Vector Set Model new Data Partitioning Models

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich System Requirements:  System should help to reduce the cost of developing new parts  Avoidance of „reinventing the wheel“  Reusing existing parts Introduction spatial objects complex CAD-DB similarity query timeout unapt results similarity query meaningful results in comparatevily short time Solution:  Efficient Similarity Search  Effective Similarity Search Similarity Model based on Sets of Feature Vectors }

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction Space Partitioning Models Data Partitioning Models Evaluation Conclusion Space Partitioning Models Introduction

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich  Voxelization of triangle meshes and object normalization normalized, voxelized object Space Partitioning Models Feature Transformation 0.7 5 CAD system  3D CAD object is represented by a mesh of triangles triangle meshes  Partitioning of the data space into disjoint, enumerated cells  Extraction of k spatial features for each cell 0.34...... feature vector  Similarity of objects = vicinity of according feature vectors

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Space Partitioning Models Notation r = 9 p = 3 CAD object representing V o [2D example]  The data space is partitioned into p axis-parallel grid cells in each dimension cells in each dimension  Let r = the raster (voxel) resolution V o = set of voxels representing object o  O V i o = set of voxels covered by o in cell i f o (i) = i-th value of the feature vector of o

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Space Partitioning Models The Volume Model 4 [2D example] V i o  Count the number of object voxels V i o in each cell i  Normalize by the voxel capacity of each cell K  Feature value for cell i: f o (i) = where K = in the 3D case f o (i) = where K = in the 3D case K V o i 3 p r )( 1/91/9 6 6 6 9366093660

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich  The solid angle model measures the concavity and convexity of surfaces 0.340.300.31 0.32  Compute the SA-value SA(v) for each surface-voxel v of object o: SA(v)=, where is a voxelized reference sphere around v SA(v)=, where is a voxelized reference sphere around v |S v  V o | |S v | SvSv Space Partitioning Models The Solid Angle Model [2D example] 0 1 SySy y SxSx x  Each cell is represented by one dimension in the feature vector  f o (i) = 0 if cell i contains no voxel of o  f o (i) = 1 if cell i contains only inside voxel of o  m j=1 m 1  f o (i) = SA(v)

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction Space Partitioning Models Data Partitioning Models Evaluation Conclusion Space Partitioning Models Data Partitioning Models

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Data Partitioning Models Cover Sequence Model S 2 =((C 0 +C 1 )+ C 2 ) Err 2 =10 71237123 [2D example] S 1 =(C 0 +C 1 ) Err 1 =14 Cover-Sequence: Error: 2D feature vector f o : 11671167 f o 4·i+1 = x-position of C i f o 4·i+2 = y-position of C i f o 4·i+3 = x-extension of C i f o 4·i+4 = y-extension of C i  Approximation of the object by means of a cover sequence  Approximation of the object by means of a cover sequence (Jagadish 91) 65136513 S 3 =((C 0 +C 1 )+ C 2 )-C 3 ) Err 3 =7  Cover sequence: S k = (((C 0  1 C 1 )  2 C 2 ) …  k C k ), where  i  {+, -}, k the number of covers, and C i axis-parallel (hyper-) rectangles k the number of covers, and C i axis-parallel (hyper-) rectangles  Approximation quality: symmetric volume difference Err k =|o XOR S k |  Computation of S k by means of a greedy algorithm  The object is represented by a 6·k dimensional feature vector (3D case)

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Data Partitioning Models Vector Set Model S 4 query (original) = ((((C 0 + C 1 ) – C 2 ) – C 3 ) – C 4 ) S 4 database S 4 query (optimal) = ((((C 0 + C 1 ) – C 3 ) – C 4 ) – C 2 ) S 4 database database object query object q 1px q 1py q 1ex q 1ey q 2px q 2py q 2ex q 3px q 3py q 3ex q 3ey q 4px q 4py q 4ex q 4ey d euclid (, db 1px db 1py db 1ex db 1ey db 2px db 2py db 2ex db 3px db 3py db 3ex db 3ey db 4px db 4py db 4ex db 4ey ) q 1px q 1py q 1ex q 1ey q 3px q 3py q 3ex q 3ey q 4px q 4py q 4ex q 4ey q 2px q 2py q 2ex d euclid (, db 1px db 1py db 1ex db 1ey db 2px db 2py db 2ex db 3px db 3py db 3ex db 3ey db 4px db 4py db 4ex db 4ey ) >>

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Data Partitioning Models Vector Set Model position X position Y extension Y extension X q 1px q 1py q 1ex q 1ey q 2px q 2py q 2ex q 3px q 3py q 3ex q 3ey q 4px q 4py q 4ex q 4ey q 1px q 1py q 1ex q 1ey q 2px q 2py q 2ex q 3px q 3py q 3ex q 3ey q 4px q 4py q 4ex q 4ey db 1px db 1py db 1ex db 1ey db 2px db 2py db 2ex db 3px db 3py db 3ex db 3ey db 4px db 4py db 4ex db 4ey db 1px db 1py db 1ex db 1ey db 2px db 2py db 2ex db 3px db 3py db 3ex db 3ey db 4px db 4py db 4ex db 4ey  the cover sequence S k = (((C 0  1 C 1 )  2 C 2 ) …  k C k ) is represented by a set of vectors X   6, | X |  k (in the 3D case) by a set of vectors X   6, | X |  k (in the 3D case) [2D example] query objectdatabase object  distance measure between two vector sets X and Y:  perfect matching  create a complete bipartite graph G = (X  Y, X  Y)  weight function for unmatched nodes if |X|  |Y|  weight of each edge (x, y)  X  Y is d euclid (x,y)  computed by the Kuhn Munkres algorithm in O(k 3 )  the minimum weight position X position Y extension Y extension X weight function for unmatched nodes= distance to a dummy cover

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Data Partitioning Models Vector Set Model  Efficient similarity queries based on multi-step query processing  range queries  range queries (Faloutsos et al. 94)  k-Nearest Neighbor Queries  k-Nearest Neighbor Queries (Korn et al. 96)  optimal Multi-Step k-Nearest Neighbor Search  optimal Multi-Step k-Nearest Neighbor Search (Seidl, Kriegel 98) Filter Step (index-based) Refinement Step (exact evaluation) candidates results  k (=cardinality of the two vector sets) times the distance between the centroides of the two vector sets, lower bounds the minimum weight perfect matching distance the two vector sets, lower bounds the minimum weight perfect matching distance query object database object position X position Y extension Y extension X  lower bounding property guarantees no false drops   o 1, o 2  O : d o (o 1, o 2 )  d f (o 1, o 2 ) query centroid database centroid

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction Space Partitioning Models Data Partitioning Models Evaluation Conclusion Data Partitioning Models Evaluation

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Evaluation  Evaluation of similarity models by means of k-nn queries  report the k objects having the smallest distance to a query object q distance:0.0 0.368 0.666 distance: 0.00.00980.3070.416 0.46 0,022 0,0178 0,0176 0,0 distance: 0,00,04 0,070,12 volume model: solid angle model: „good“ similarity model? „bad“ similarity model? volume model: solid angle model:  Problem: evaluation using k-nn queries is subjective evaluation using k-nn queries is subjective quality measure of a model depends on quality measure of a model depends on the choice of the query objects the choice of the query objects K-nn Queries

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Evaluation  Hierarchical Clustering:  More objective since each object of the database is taken into account to measure the quality of a similarity model is taken into account to measure the quality of a similarity model  OPTICS  OPTICS (Kriegel et al. 99) Yields a density-based hierarchical clustering Yields a density-based hierarchical clustering Insensitive to input parameters Insensitive to input parameters Result (so called reachability plot) can be easily visualized Result (so called reachability plot) can be easily visualized and is suitable for interactive exploration and is suitable for interactive exploration A1A1 A2A2 22 A1A1 A2A2 B B AB A B 11 Data Space Reachability Plot Hierarchical Clustering

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Evaluation Volume Model Class A A B C no classes found Class B Class C Solid Angle Model Car Dataset app. 200 parts, r=30, p=3 Space Partitioning Similarity Models

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Evaluation Class E Class X Class G X A C E G Cover Sequence Model Vector Set Model Class E Class G 2 Class G 1 Class F Class A 2 Class A 1 A1A1 A2A2 B C D E F G1G1 G2G2 A G Car Dataset app. 200 parts, r=15, 7 covers Data Partitioning Similarity Models

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Evaluation  Efficiency Evaluation:  100 10-nn-queries on the plane database, cover sequence with 7 covers cover sequence with 7 covers CPU time [sec] I/O time [sec] total runtime [sec] vector set without filter1025.32806.401831.72 vector set with filter (X-tree) 105.88932.801038.68 cover sequence (X-tree) 142.822632.062774.88  vector set model cover sequence model  vector set model outperforms cover sequence model Efficiency of the Vector Set Model  vector set model without filter vector set model with filter  Filter step leads to a speed up factor of approximately 2  Filter step has a selectivity of approximately 20%

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction Space Partitioning Models Data Partitioning Models Evaluation Conclusion Evaluation Conclusion

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Conclusion  Contribution:  Sets of feature vectors : a new way of representing objects in similarity search a new way of representing objects in similarity search somewhere between feature vectors and graphs somewhere between feature vectors and graphs  Effective and efficient similarity model for CAD data based on sets of feature vectors based on sets of feature vectors  Evaluation of similarity models based on hierarchical clustering position X position Y extension Y extension X q 1px q 1py q 1ex q 1ey q 2px q 2py q 2ex q 3px q 3py q 3ex q 3ey q 4px q 4py q 4ex q 4ey q 1px q 1py q 1ex q 1ey q 2px q 2py q 2ex q 3px q 3py q 3ex q 3ey q 4px q 4py q 4ex q 4ey db 1px db 1py db 1ex db 1ey db 2px db 2py db 2ex db 3px db 3py db 3ex db 3ey db 4px db 4py db 4ex db 4ey db 1px db 1py db 1ex db 1ey db 2px db 2py db 2ex db 3px db 3py db 3ex db 3ey db 4px db 4py db 4ex db 4ey

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Conclusion  Future Work:  BOSS (Browsing OPTICS-Plots for Similarity Search)  Interactive data browsing tool based on reachability plots  User-friendly method to support the time-consuming task of finding similar parts: of finding similar parts: Revealing the hierarchical clustering structure Revealing the hierarchical clustering structure of the dataset at a glance of the dataset at a glance Displaying suitable representatives for large clusters Displaying suitable representatives for large clusters

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Thank you for your attention Any questions? ? ? ? ? ? ? ? ?

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.

Similar presentations

Presentation on theme: "San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.

Similar presentations

Presentation on theme: "San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized."— Presentation transcript:

Similar presentations

About project

Feedback