Data Mining CSCI 307, Spring 2019 Lecture 23

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

Conceptual Clustering
Nearest Neighbor Search
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Nearest Neighbor Finding Using Kd-tree Ref: Andrew Moore’s PhD thesis (1991)Andrew Moore’s PhD thesis.
Multidimensional Indexing
Searching on Multi-Dimensional Data
2-dimensional indexing structure
Computational Support for RRTs David Johnson. Basic Extend.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Geometric index structures April 15, 2004 Based on GUW Chapter , [Arge01] Sections 1, 2.1 (persistent B- trees), 3-4 (static versions.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Efficient Distance Computation between Non-Convex Objects By Sean Quinlan Presented by Sean Augenstein and Nicolas Lee.
Module 04: Algorithms Topic 07: Instance-Based Learning
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Issues with Data Mining
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Multi-dimensional Search Trees
PRESENTED BY – GAURANGI TILAK SHASHANK AGARWAL Collision Detection.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
Vector Quantization CAP5015 Fall 2005.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
New Algorithms for Efficient High-Dimensional Nonparametric Classification Ting Liu, Andrew W. Moore, and Alexander Gray.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Spatial Data Management
School of Computing Clemson University Fall, 2012
Bounding Volume Hierarchies and Spatial Partitioning
Computational Geometry
B/B+ Trees 4.7.
BackTracking CS255.
Data Science Algorithms: The Basic Methods
Instance Based Learning
Ch9: Decision Trees 9.1 Introduction A decision tree:
Backtracking And Branch And Bound
Bounding Volume Hierarchies and Spatial Partitioning
Spatial Indexing I Point Access Methods.
Database Management Systems (CS 564)
Mean Shift Segmentation
KD Tree A binary search tree where every node is a
Nearest Neighbor Queries using R-trees
Advanced Topics in Data Management
Orthogonal Range Searching and Kd-Trees
Fitting Curve Models to Edges
CPSC-310 Database Systems
K Nearest Neighbor Classification
Computer Science cpsc322, Lecture 13
Clustering.
Classification Algorithms
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Shmuel Wimer Bar Ilan Univ., School of Engineering
Birch presented by : Bahare hajihashemi Atefeh Rahimi
CSC 558 – Data Analytics II, Prep for assignment 1 – Instance-based (lazy) machine learning January 2018.
CO Games Development 1 Week 8 Depth-first search, Combinatorial Explosion, Heuristics, Hill-Climbing Gareth Bellaby.
Shape-based Registration
B-Trees.
Data Mining CSCI 307, Spring 2019 Lecture 24
Data Mining CSCI 307, Spring 2019 Lecture 21
Data Mining CSCI 307, Spring 2019 Lecture 11
Data Mining CSCI 307, Spring 2019 Lecture 6
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

Data Mining CSCI 307, Spring 2019 Lecture 23 Instance Based Learning

kD-trees The idea: Split the point set alternating by x-coordinate and by y-coordinate Split by x-coordinate: split by a vertical line that has half the points left (or on) and half right Split by y-coordinate: split by a horizontal line that has half the points below (or on) and half above (2,4);h (4,1);v (6,7);v (1,2) (8,2) (3,8) (7,5) (6,7) (3,8) (8,2) (1,2) (4,1) (2,4) (7,5)

A's children can be discarded (as "already searched), B is the best estimate for the entire tree

More on kD-trees Complexity depends on depth of tree, given by logarithm of number of nodes Amount of backtracking required depends on quality (balance) of tree (“square” vs. “skinny” nodes) How to build a good tree? Need to find good split point and split direction Split direction: direction with greatest variance Split point: median value along that direction Using value closest to mean (rather than median) can be better if data is skewed Apply recursively

Building Trees Incrementally Big advantage of instance-based learning: classifier can be updated incrementally Just add new training instance! Can we do the same with kD-trees? Heuristic strategy: Find leaf node containing new instance Place instance into leaf if leaf is empty Otherwise, split leaf according to the longest dimension (to preserve squareness) Tree should be re-built occasionally (i.e. if depth grows to twice the optimum depth.)

Ball Trees Problem in kD-trees: the corners Observation: no need to make sure that regions do not overlap Use balls (hyperspheres) instead of hyperrectangles A ball tree organizes the data into a tree of k-dimensional hyperspheres Normally better fit to the data (more efficient search)

Ball Tree (16 instances) Example instances and balls the tree

Using Ball Trees Nearest-neighbor search is done using the same backtracking strategy as in kD-trees Ball can be ruled out from consideration if: distance from target to ball's center exceeds ball's radius plus current upper bound

Building Ball Trees Ball trees may be built top down (like kD-trees) Do not have to continue until leaf balls contain just two points: can enforce minimum occupancy (same in kD-trees) Basic problem: splitting a ball into two Simple (linear-time) split selection strategy: Choose point farthest from ball's center Choose second point farthest from first one Assign each point to these two points Compute cluster centers and radii based on the two subsets to get two balls Tighter balls require more computation

Discussion of Nearest-Neighbor Learning Simple and fairly accurate Assumes all attributes are equally important Statisticians have used NN since early 1950s kD-trees become inefficient when number of attributes is too large (approx > 10) Ball trees work well in higher-dimensional spaces