Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO.

Slides:

Advertisements

Similar presentations

Song Intersection by Approximate Nearest Neighbours Michael Casey, Goldsmiths Malcolm Slaney, Yahoo! Inc.

Advertisements

When Is Nearest Neighbors Indexable? Uri Shaft (Oracle Corp.) Raghu Ramakrishnan (UW-Madison)

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Clustered Pivot Tables for I/O-optimized Similarity Search Juraj Moško, Jakub Lokoč, Tomáš Skopal Department of Software Engineering Faculty of Mathematics.

7/03Spatial Data Mining G Dong (WSU) & H. Liu (ASU) 1 6. Spatial Mining Spatial Data and Structures Images Spatial Mining Algorithms.

k-Nearest Neighbors Search in High Dimensions

CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.

Presented by Xinyu Chang

Efficiently searching for similar images (Kristen Grauman)

3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.

Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.

Mining Time Series.

Cascaded Filtering For Biometric Identification Using Random Projection Atif Iqbal.

Spatial Mining.

A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.

Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

Image Search Presented by: Samantha Mahindrakar Diti Gandhi.

Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li

Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.

Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.

1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.

Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.

Important Extrema of Time Series Eugene Fink Harith S. Gandhi.

Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.

IIIT Hyderabad Atif Iqbal and Anoop Namboodiri Cascaded.

Data Mining Techniques

Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.

Nearest Neighbor Searching Under Uncertainty

HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli.

Handling Spatial Data In P2P Systems Verena Kantere, Timos Sellis, Yannis Kouvaras.

Mining Time Series.

M- tree: an efficient access method for similarity search in metric spaces Reporter ： Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU

1 Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda, Tetsuya Takiguchi, Yasuo Ariki Graduate School of Engineering,

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.

Visual Information Systems Recognition and Classification.

Efficient Metric Index For Similarity Search Lu Chen, Yunjun Gao, Xinhan Li, Christian S. Jensen, Gang Chen.

Dynamic P2P Indexing and Search based on Compact Clustering Mauricio Marin Veronica Gil-Costa Cecilia Hernandez UNSL, Argentina Universidad de Chile Yahoo!

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.

Geometric Problems in High Dimensions: Sketching Piotr Indyk.

An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)

Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.

Similarity Access for Networked Media Connectivity Pavel Zezula Masaryk University Brno, Czech Republic.

V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.

Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.

Tomáš Skopal 1, Benjamin Bustos 2 1 Charles University in Prague, Czech Republic 2 University of Chile, Santiago, Chile On Index-free Similarity Search.

Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.

Overview Data Mining - classification and clustering

The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

1 Queryy Sampling Based High Dimensional Hybrid Index Junqi Zhang, Xiangdong Zhou Fudan University.

CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.

Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman. To appear in PODS.

Similarity Search without Tears: the OMNI- Family of All-Purpose Access Methods Michael Kelleher Kiyotaka Iwataki The Department of Computer and Information.

Spatial Data Management

SIMILARITY SEARCH The Metric Space Approach

Near Duplicate Detection

Christian Böhm, Bernhard Braunmüller, Florian Krebs, and Hans-Peter Kriegel, University of Munich Epsilon Grid Order: An Algorithm for the Similarity.

Spatial Indexing I Point Access Methods.

A. Vadivel, M. Mohan, Shamik Sural and A. K. Majumdar

K Nearest Neighbor Classification

Nearest-Neighbor Classifiers

Research Areas Christoph F. Eick

15-826: Multimedia Databases and Data Mining

Locality Sensitive Hashing

BOOSTING IMAGE RETRIEVAL

Nearest Neighbors CSC 576: Data Mining.

Presentation transcript:

Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO UNIVERSIDAD DE CHILE, CHILE

Content 1.About the problem 2.Basic concepts 3.Previous work 4.Our technique 5.Experiments 6.Conclusion and future wok

Proximity Searching Huge Database Exact searching is not possible Expensive distance

Applications Retrieval Information Classification People finder through the web Clustering Currently used on –Classification of Spider’s web –Face recognition on Chilean’s Web

Problems (metric spaces) Index Extraction of characteristics Complex objects High dimension Memory limited Huge databases

Terminology Queries –Range query –K nearest neighbor Properties Symmetry Strict possitiveness Triangle inequality

Previous work Pivot based Partition based Pivot distance q

Previous work Pivot based Partition based centro q

Our technique Permutation Permutant p3 p2 p5 P4 P6 u P1

Our technique Exact matching elements have the same permutation Similar elements must have a similar permutation (we guess) Spearman footrule metric –Measures the similarity of the permutations –Promissority elements first

Spearman Footrule metric Example 3-1, 6 - 2, 3-2, 4-1, 5-5, 6-4 Difference of positions

Searching process (1a. part) Preprocessing time Permutant p1 p2 p3 p3,p1,p2 p3,p2,p1 p2,p1,p3 p2,p3,p1

Searching process (2a. part) Query time Permutant p1 p2 p3 p3,p1,p2 p3,p2,p1 p2,p1,p3 p2,p3,p1 q p2,p1,p3 Sorting elements by Spearman Footrule metric p2,p1,p3 p2,p3,p1 ….. p3,p1,p2

Experiments 93% retrieved, comparing 10% of database 90% retrieved, comparing 60% of database Pivot based algorithm Retrieved 48% %retrieved

Experiments 100% retrieved, comparing 15% of database 100% retrieved, comparing 90% of database %retrieved

How good is our prediction? retrieved Dimension 256, using 256 pivots Percentage of the database compared Metric algorithms are using one of them

Similarities between permutations Almost the same value

Conclusion A new probabilistic algorithm for proximity searching in metric space. Our technique is based on permutations. Close elements will have similar permutations. This technique is the fastest known algorithm for high dimension. Permutations are good predictor

Future Work Can Non-metric spaces be tackled with this technique? Approximated all K Nearest neighbor algorithm. Improving other metric indexes.

Thank you UNIVERSIDAD MICHOACANA, MEXICO UNIVERSIDAD DE CHILE, CHILE