An Intelligent & Incremental Approach to kNN using R-trees DJ Oneil & Esten Rye (G01)

Slides:



Advertisements
Similar presentations
1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
Advertisements

7/03Spatial Data Mining G Dong (WSU) & H. Liu (ASU) 1 6. Spatial Mining Spatial Data and Structures Images Spatial Mining Algorithms.
Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Data Mining Classification: Alternative Techniques
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.
Mario Rodriguez Revollo School of Computer Science, UCSP SlimSS-tree: A New Tree Combined SS- tree With Slim-down Algorithm Lifang Yang, Xianglin Huang,
Classification and Decision Boundaries
Spatial Mining.
Optimization of ICP Using K-D Tree
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
1 An Adaptive Nearest Neighbor Classification Algorithm for Data Streams Yan-Nei Law & Carlo Zaniolo University of California, Los Angeles PKDD, Porto,
High-Dimensional Similarity Search using Data-Sensitive Space Partitioning ┼ Sachin Kulkarni 1 and Ratko Orlandic 2 1 Illinois Institute of Technology,
Geographic Information Systems
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
FLANN Fast Library for Approximate Nearest Neighbors
10. Creating and Maintaining Geographic Databases.
Data Mining Techniques
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.
Module 04: Algorithms Topic 07: Instance-Based Learning
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Mutlidimensional Indices Instructor: Randal Burns Lecture for 29 November 2005 Computer Science Johns Hopkins University.
1 Converting Categories to Numbers for Approximate Nearest Neighbor Search 嘉義大學資工系 郭煌政 2004/10/20.
Computer Science 101 Database Concepts. Database Collection of related data Models real world “universe” Reflects changes Specific purposes and audience.
Indexing for Multidimensional Data An Introduction.
Active Learning on Spatial Data Christine Körner Fraunhofer AIS, Uni Bonn.
February 3, Location Based M-Services The numbers of on-line mobile personal devices increase. New types of context-aware e-services become possible.
Introduction to The NSP-Tree: A Space-Partitioning Based Indexing Method Gang Qian University of Central Oklahoma November 2006.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
RELAXED REVERSE NEAREST NEIGHBORS QUERIES Arif Hidayat Muhammad Aamir Cheema David Taniar.
Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Creating and Maintaining Geographic Databases. Outline Definitions Characteristics of DBMS Types of database Relational model SQL Spatial databases.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
Group 8: Denial Hess, Yun Zhang Project presentation.
Wei-Shinn Ku Slide 1 Auburn University Computer Science and Software Engineering Query Integrity Assurance of Location-based Services Accessing Outsourced.
Observer Relative Data Extraction Linas Bukauskas 3DVDM group Aalborg University, Denmark 2001.
An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)
Presented by: Daniel Hess, Yun Zhang. Motivation Problem statement Major contributions Key concepts Validation methodology Assumptions Recommended changes.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
A New Spatial Index Structure for Efficient Query Processing in Location Based Services Speaker: Yihao Jhang Adviser: Yuling Hsueh 2010 IEEE International.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman. To appear in PODS.
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
What Is Cluster Analysis?
Data Science Algorithms: The Basic Methods
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Location Privacy.
Spatio-temporal Pattern Queries
K Nearest Neighbor Classification
Nearest-Neighbor Classifiers
Interactive Visual System
Instance Based Learning
COSC 4335: Other Classification Techniques
Exact Nearest Neighbor Algorithms
Donghui Zhang, Tian Xia Northeastern University
Data Mining CSCI 307, Spring 2019 Lecture 11
Presentation transcript:

An Intelligent & Incremental Approach to kNN using R-trees DJ Oneil & Esten Rye (G01)

Presentation Outline Motivation Related Work Problem Definition Approach Validation Conclusion

Motivation kNN is a popular (GIS, AI, Pattern Recognition, Clustering, Outlier Detection) kNN is a hard problem R-tree is the industry standard (Oracle, Microsoft SQL Server, DB2, and MySQL) Problems with higher dimensional spaces GIS

Related Work Voronoi Diagram Incremental approach (find k+1 using k) High dimensions (X-tree) New data structures (k-d tree, P-range tree, X-tree, SS-tree, …)

What’s Missing??? Domain specific classifications Informed, incremental approach to R-tree kNN

Problem Definition Given: Spatial database with n objects and query point, q. Find: The k ≤ n ranked nearest neighbors. Objective:  Use object classifications  Incremental Constraints:  Spatial objects are stored in an R-Tree

Key Ideas Allow users to define domain-specific classifiers to decrease search space Use informed, incrementally increasing query region to decrease search space Don’t worry about finding exactly k nearest neighbors.

Approach Object Classification Distance Classification Incrementally increasing concentric circle query regions

Detour: R-tree

Object Classification Domain specific classifiers. Only search MBBs that contain classifications Adds classification dimensions. Example: Zoning Classifier  {“Residential”, “Industrial”, Commercial”}

Distance Classification Maps Euclidean distance/increment generator to region Default function Separate R-tree

Concentric Circles Decrease candidate regions Only consider MBBs that are completely contained in query region Ignore previously searched MBBs

Algorithm Example: 3 nearest squares 1) Get distance function 2) Search…

Validation Find nearest gas stations (Zoning example)  1.7% total searchable area of Minneapolis Complexity:  p classifiers with q classifications  Computational: O(p*log α (q))* O(logα(n)) ≈ O(log α (n))  Spatial: (p*q*s + t)(n + α*log α n + α)

Conclusion Expand R-trees for kNN User-defined, domain specific classifiers to decrease search space User defined incremental distance function Increasing Euclidean distance, Concentric Circles

Future Work Extend distance classifier to include many classifiers Non-Euclidean distance (e.g. speed limit) Combine distance classification tree with data tree Experiment Plan for incrementally upgrading existing R-tree implementations Determine threshold for number of classifiers and classifications

Any Questions???