VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Fast Algorithms For Hierarchical Range Histogram Constructions

Indexing and Range Queries in Spatio-Temporal Databases

On Map-Matching Vehicle Tracking Data

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.

An Efficient Multi-Dimensional Index for Cloud Data Management Xiangyu Zhang Jing Ai Zhongyuan Wang Jiaheng Lu Xiaofeng Meng School of Information Renmin.

Surrogate Model Based Differential Evolution for Multiobjective Optimization (GP - DEMO) Šmarna gora, Miha Mlakar Algoritmi po vzorih iz narave.

S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

PROCESS INTEGRATED DESIGN WITHIN A MODEL PREDICTIVE CONTROL FRAMEWORK Mario Francisco, Pastora Vega, Omar Pérez University of Salamanca – Spain University.

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

Design Of New Index Structures For The ISAT Algorithm By Biswanath Panda, Mirek Riedewald, Paul Chew, Johannes Gehrke.

Approximating Sensor Network Queries Using In-Network Summaries Alexandra Meliou Carlos Guestrin Joseph Hellerstein.

© 2003 Warren B. Powell Slide 1 Approximate Dynamic Programming for High Dimensional Resource Allocation NSF Electric Power workshop November 3, 2003 Warren.

Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.

Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Copyright 2008 Koren ECE666/Koren Part.9b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.

NORM BASED APPROACHES FOR AUTOMATIC TUNING OF MODEL BASED PREDICTIVE CONTROL Pastora Vega, Mario Francisco, Eladio Sanz University of Salamanca – Spain.

Index Structures For ISAT Work In Progress By Biswanath Panda, Mirek Riedewald, Paul Chew & Johannes Gehrke.

Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.

R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.

Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.

Ensemble Learning (2), Tree and Forest

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.

July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.

Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.

Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University

CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.

© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Biswanath Panda, Mirek Riedewald, Daniel Fink ICDE Conference 2010 The Model-Summary Problem and a Solution for Trees 1.

A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.

Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.

September Bound Computation for Adaptive Systems V&V Giampiero Campa September 2008 West Virginia University.

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

Linear Programming Erasmus Mobility Program (24Apr2012) Pollack Mihály Engineering Faculty (PMMK) University of Pécs João Miranda

Reporter ： Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.

Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.

Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.

Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:

ICDE-2006 Subramanian Arumugam Christopher Jermaine Department of Computer Science University of Florida 22nd International Conference on Data Engineering.

Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.

Heuristics for Minimum Brauer Chain Problem Fatih Gelgi Melih Onus.

Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.

Multidimensional Access Methods Ho Hoang Nguyen Nguyen Thanh Trong Dao Vu Quoc Trung Ngo Phuoc Huong Thien DATABASE.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.

A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie Microsoft Research.

Signal processing and Networking for Big Data Applications: Lecture 9 Mix Integer Programming: Benders decomposition And Branch & Bound NOTE: To change.

LECTURE 11: Advanced Discriminant Analysis

Database Management System

Non-additive Security Games

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Introduction to Operations Research

Advanced Associative Structures

Distributed Probabilistic Range-Aggregate Query on Uncertain Data

DATABASE HISTOGRAMS E0 261 Jayant Haritsa

Efficient Aggregation over Objects with Extent

Presentation transcript:

VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University

VLDB 2006, Seoul2 Motivation Simulations are important in science Large simulations computationally infeasible –Driven by complex mathematical models –Require solution to complex differential equations Approximation techniques speed up simulations –Bounded error in the simulation –Approximate simulation steps using information from previous steps

VLDB 2006, Seoul3 Outline Example scientific application –Combustion simulation Function approximation problem –Formulation –Hardness –Algorithm Indexing problem

VLDB 2006, Seoul4 Combustion Simulation High Dimensional Composition Vector Inflow Outflow Mixing & Reaction Air Methane Air + Methane

VLDB 2006, Seoul5 Properties Of Simulation Composition dimensionality –9 for simple hydrogen simulations –>50 for complex methane simulations Cost of reaction function evaluation: 30ms Number of function evaluations: 10 8 to Total simulation time –10 8 function evaluations ≈ 35 days

VLDB 2006, Seoul6 Function Approximation Approximate the reaction function Approach –Use previous function evaluations to approximate future function evaluations –ISAT (In Situ Adaptive Tabulation) [Pope’ 97] Definition: ε-approximation of f(x) –Let f: R m → R n be a function, let x  R m and ε  R. f*(x) is an ε-approximation of f(x) if || f*(x) –f(x)|| < ε

VLDB 2006, Seoul7 Challenges In Function Approximation How can a suitable f* be found? Global Approach –Single model to approximate entire function –Neural Networks Local Approach –Divide function into parts and model each part separately –Each part may be modeled more accuractely and simply –Need not learn entire function: only that needed by simulation

VLDB 2006, Seoul8 Example Cost f

VLDB 2006, Seoul9 Example x2x2 x1x1 ε ε f*(x 2 ) = f(x) + s * (x 2 - x) ( x, f(x) ) An ε-Local Region R f,f* (x, ε)  R m Original Cost Cost f

VLDB 2006, Seoul10 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 Original Cost Cost Example f f1*f1* f2*f2* f3*f3*

VLDB 2006, Seoul11 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 Example f f1*f1* f2*f2* f3*f3* When should a local region be added?

VLDB 2006, Seoul12 Example Each query point can be covered by several Local Regions x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8 f f1*f1* f2*f2* f3*f3* f4*f4*

VLDB 2006, Seoul13 Local Region  x’: x 3 < x’ < x 4, f*(x’) is an ε-approximation of f(x’) Interval [x 3,x 4 ] defines a Local Region around x ( x, f(x) ) f*(x 2 ) = f(x) + s * (x 2 - x) x3x3 x4x4 ε ε x2x2

VLDB 2006, Seoul14 Definition: Local Region An ε-Local Region R f,f* (x, ε)  R m for function f based on approximation f* at point x is a maximal connected region containing x  R m such that  x’  R f,f* (x, ε): f*(x’) is an ε-approximation of f(x’)

VLDB 2006, Seoul15 Challenges Finding good f* s and corresponding Local Regions Computing a set of Local Regions Data management: storing Local Regions for future use Problem: Minimize total simulation time by computing and storing a set of Local Regions

VLDB 2006, Seoul16 Function Approximation With Local Regions Minimize total simulation by computing and storing for future use an optimal set of local regions Computing an optimal set of Local Regions –Interesting optimization problem Cost associated with local regions –Function evaluation for one point in region –Estimating the extent of the region Query distribution determines utility of a Local Region Overlapping Local Regions

VLDB 2006, Seoul17 Finding The Optimal Set Of Local Regions Simplified cost model –Both the function value and Local Region at a point can be obtained at some constant cost equal across all regions –Approximations have zero cost Offline Problem –Given a set X={ x 1, x 2, … x n } of query points, find the smallest set L={ l 1, l 2, … l k } of Local Regions, such that for each x i  X there is an l j  L which contains x i –NP-Complete: Reduction from Geometric Covering By Discs Online Problem –No online algorithm is competitive

VLDB 2006, Seoul18 Online Analysis Offline problem is hard What about a competitive online algorithm? Online Problem –For i>0 let X(i)={ x 1,… x i } and L(i)={ l 1,…l k(i) }, where  i > 1, k(i) is some integer with k(i-1)<k(i) –Find the smallest set L(n) such that the following holds for each set X(i) Each x  X(i) is contained in some Local Region l  L(i) No online algorithm is competitive

VLDB 2006, Seoul19 Algorithm Illustration x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8 f f1*f1* f2*f2* f3*f3* f4*f4*

VLDB 2006, Seoul20 Algorithm Initialize S Lookup x in S Local Region Found? Return Approximation Y N Add new region containing x to S Evaluate function at x Retrieve Add Simulation

VLDB 2006, Seoul21 Possible Instantiation Of Local Regions Local Regions can be approximated using high dimensional ellipsoids [Pope ‘97] –Based on Taylor Expansion of function Two step approach –Initial conservative approximation –Grow xx1x1

VLDB 2006, Seoul22 Example x2x2 x1x1 x ε’ < ε

VLDB 2006, Seoul23 Example x’ 2 x x’ 1 ε’ < ε

VLDB 2006, Seoul24 Example x’ 1 x’ 2 x ε ε’ < ε

VLDB 2006, Seoul25 Example x2x2 x1x1 x9x9 x 10

VLDB 2006, Seoul26 Updating Existing Regions N Evaluate function at x Can existing region contain x? Update existing regions to contain x Add new region containing x to S Grow NY

VLDB 2006, Seoul27 Initialize S Lookup x in S Local Region Found? Return Approximation Y Retrieve Add Simulation Evaluate function at x Can existing region contain x? Update existing regions to contain x Add new region containing x to S NY N Grow

VLDB 2006, Seoul28 Outline Example scientific application –Combustion Simulation Function Approximation Problem –Formulation –Hardness –Algorithm Indexing problem

VLDB 2006, Seoul29 Indexing Problem Workload –Retrieve: Find ellipsoid containing query point

VLDB 2006, Seoul30 Indexing Problem Workload –Retrieve: Find ellipsoid containing query point –Grow Find ellipsoids to be grown Update grown ellipsoids

VLDB 2006, Seoul31 Indexing Problem Workload –Retrieve: Find ellipsoid containing query point –Grow Find ellipsoids to be grown Update grown ellipsoids –Add: Insert a new ellipsoid

VLDB 2006, Seoul32 New Indexing Problem Shape of regions Updates and queries interleaved Additional costs: ellipsoid maintenance costs Overall aim: Reduce total simulation time Retrieve/grow/add are all optional –Tuning parameters at each step OperationCost Evaluation2000 Addition1200 Grow10 Approximation1 Search1

VLDB 2006, Seoul33 Approach Goal: Understand index selection for function approximation Empirically compare different index structures Develop a cost model –Accounting for all cost components –Qualitatively explain variations in performance of index structures

VLDB 2006, Seoul34 Outline Example scientific application –Combustion simulation Function approximation problem –Formulation –Hardness –Algorithm Indexing problem –Cost structure, tuning parameters and effects –Index structures and experiments

VLDB 2006, Seoul35 Grow Effects C miss = t f + t growsearch + I grow * C grow + (1-I grow )*C add Tuning Parameter: Ellg –Limit on number of ellipsoids examined for growing –No pruning criteria –Affects t growsearch Chance of finding a growable ellipsoid Tuning Parameter: N grown –Number of ellipsoids grown per step –Affects C grow Structure of the index (overlapping ellipsoids)

VLDB 2006, Seoul36 Retrieve Effects C tot = t search + I ret * t la + (1-I ret ) * C miss Tuning Parameter: Ellr –Limit on number of ellipsoids examined during retrieve –Limits how much of the index is searched –Affects t search Chances of a current retrieve and also future retrieves

VLDB 2006, Seoul37 Retrieve Effects C tot = t search + I ret * t la + (1-I ret ) * C miss Number of ellipsoids examined during retrieve ( Ellr ) –Increasing Ellr (+) Increases chances of a retrieve (-) Reduces the benefit of the retrieve ( increases t search ) -Missed retrieves cause grows and adds which affect future retrieves –Index new parts of the domain –Change structure of the index

VLDB 2006, Seoul38 Add Effects C miss = t f + t growsearch + I grow * C grow + (1-I grow )*C add Tuning parameter: Indirectly controlled by retrieves and grows –Affects Should query point be covered by an add or grow? (-) Computing new ellipsoids is expensive (-) New ellipsoids cover smaller part of the domain (+) May lead to better ellipsoid distribution

VLDB 2006, Seoul39 Candidate Index Structures Bounding Box Rtree Point Rtree Ellipsoid Rtree Random Projection Rtree Binary Tree MRU List + Rtree

VLDB 2006, Seoul40 Binary Tree Primary Retrieve A C B 1 2 A B C 2 1 q

VLDB 2006, Seoul41 Binary Tree Secondary Retrieve A C B 1 2 A B C 2 1 q

VLDB 2006, Seoul42 Binary Tree A C B 1 2 A B C 2 1

VLDB 2006, Seoul43 Binary Tree Secondary Retrieve now Primary Retrieve A C B 1 2 A D B D C C

VLDB 2006, Seoul44 Effects In Action: Binary Tree 32 dimensional Methane simulation 6 x 10 6 queries Windows XP machine (2.4 Ghz, 2GB)

VLDB 2006, Seoul45 MRU List + Rtree MRU List for retrieving –High locality Rtree for searching growable ellipsoids MRU List Rtree

VLDB 2006, Seoul46 Effects In Action: MRU List + Rtree Effects very different from Binary Tree

VLDB 2006, Seoul47 Total Simulation Times Index TypeError Tolerance Binary Tree (tuned) MRU List + Rtree Bbox Rtree Random Projection Rtree Binary Tree(default) FIFO List + Rtree Point Rtree10431> Ellipsoidal Rtree14328>44000-

VLDB 2006, Seoul48 Conclusion & Future Work Formulated the function approximation problem New class of applications for high dimensional indexing Understand index selection for function approximation Future work –Dynamic parameter settings –New benchmark for index structures –Evaluation of other index structures –Comparison with other function approximation techniques

VLDB 2006, Seoul49 Questions?