VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University

VLDB 2006, Seoul2 Motivation Simulations are important in science Large simulations computationally infeasible –Driven by complex mathematical models –Require solution to complex differential equations Approximation techniques speed up simulations –Bounded error in the simulation –Approximate simulation steps using information from previous steps

VLDB 2006, Seoul3 Outline Example scientific application –Combustion simulation Function approximation problem –Formulation –Hardness –Algorithm Indexing problem

VLDB 2006, Seoul4 Combustion Simulation High Dimensional Composition Vector Inflow Outflow Mixing & Reaction Air Methane Air + Methane

VLDB 2006, Seoul5 Properties Of Simulation Composition dimensionality –9 for simple hydrogen simulations –>50 for complex methane simulations Cost of reaction function evaluation: 30ms Number of function evaluations: 10 8 to 10 10 Total simulation time –10 8 function evaluations ≈ 35 days

VLDB 2006, Seoul6 Function Approximation Approximate the reaction function Approach –Use previous function evaluations to approximate future function evaluations –ISAT (In Situ Adaptive Tabulation) [Pope’ 97] Definition: ε-approximation of f(x) –Let f: R m → R n be a function, let x  R m and ε  R. f*(x) is an ε-approximation of f(x) if || f*(x) –f(x)|| < ε

VLDB 2006, Seoul7 Challenges In Function Approximation How can a suitable f* be found? Global Approach –Single model to approximate entire function –Neural Networks Local Approach –Divide function into parts and model each part separately –Each part may be modeled more accuractely and simply –Need not learn entire function: only that needed by simulation

VLDB 2006, Seoul8 Example Cost f

VLDB 2006, Seoul9 Example x2x2 x1x1 ε ε f*(x 2 ) = f(x) + s * (x 2 - x) ( x, f(x) ) An ε-Local Region R f,f* (x, ε)  R m Original Cost Cost f

VLDB 2006, Seoul10 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 Original Cost Cost Example f f1*f1* f2*f2* f3*f3*

VLDB 2006, Seoul11 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 Example f f1*f1* f2*f2* f3*f3* When should a local region be added?

VLDB 2006, Seoul12 Example Each query point can be covered by several Local Regions x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8 f f1*f1* f2*f2* f3*f3* f4*f4*

VLDB 2006, Seoul13 Local Region  x’: x 3 < x’ < x 4, f*(x’) is an ε-approximation of f(x’) Interval [x 3,x 4 ] defines a Local Region around x ( x, f(x) ) f*(x 2 ) = f(x) + s * (x 2 - x) x3x3 x4x4 ε ε x2x2

VLDB 2006, Seoul14 Definition: Local Region An ε-Local Region R f,f* (x, ε)  R m for function f based on approximation f* at point x is a maximal connected region containing x  R m such that  x’  R f,f* (x, ε): f*(x’) is an ε-approximation of f(x’)

VLDB 2006, Seoul15 Challenges Finding good f* s and corresponding Local Regions Computing a set of Local Regions Data management: storing Local Regions for future use Problem: Minimize total simulation time by computing and storing a set of Local Regions

VLDB 2006, Seoul16 Function Approximation With Local Regions Minimize total simulation by computing and storing for future use an optimal set of local regions Computing an optimal set of Local Regions –Interesting optimization problem Cost associated with local regions –Function evaluation for one point in region –Estimating the extent of the region Query distribution determines utility of a Local Region Overlapping Local Regions

VLDB 2006, Seoul17 Finding The Optimal Set Of Local Regions Simplified cost model –Both the function value and Local Region at a point can be obtained at some constant cost equal across all regions –Approximations have zero cost Offline Problem –Given a set X={ x 1, x 2, … x n } of query points, find the smallest set L={ l 1, l 2, … l k } of Local Regions, such that for each x i  X there is an l j  L which contains x i –NP-Complete: Reduction from Geometric Covering By Discs Online Problem –No online algorithm is competitive

VLDB 2006, Seoul18 Online Analysis Offline problem is hard What about a competitive online algorithm? Online Problem –For i>0 let X(i)={ x 1,… x i } and L(i)={ l 1,…l k(i) }, where  i > 1, k(i) is some integer with k(i-1)<k(i) –Find the smallest set L(n) such that the following holds for each set X(i) Each x  X(i) is contained in some Local Region l  L(i) No online algorithm is competitive

VLDB 2006, Seoul19 Algorithm Illustration x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8 f f1*f1* f2*f2* f3*f3* f4*f4*

VLDB 2006, Seoul20 Algorithm Initialize S Lookup x in S Local Region Found? Return Approximation Y N Add new region containing x to S Evaluate function at x Retrieve Add Simulation

VLDB 2006, Seoul21 Possible Instantiation Of Local Regions Local Regions can be approximated using high dimensional ellipsoids [Pope ‘97] –Based on Taylor Expansion of function Two step approach –Initial conservative approximation –Grow xx1x1

VLDB 2006, Seoul22 Example x2x2 x1x1 x ε’ < ε

VLDB 2006, Seoul23 Example x’ 2 x x’ 1 ε’ < ε

VLDB 2006, Seoul24 Example x’ 1 x’ 2 x ε ε’ < ε

VLDB 2006, Seoul25 Example x2x2 x1x1 x9x9 x 10

VLDB 2006, Seoul26 Updating Existing Regions N Evaluate function at x Can existing region contain x? Update existing regions to contain x Add new region containing x to S Grow NY

VLDB 2006, Seoul27 Initialize S Lookup x in S Local Region Found? Return Approximation Y Retrieve Add Simulation Evaluate function at x Can existing region contain x? Update existing regions to contain x Add new region containing x to S NY N Grow

VLDB 2006, Seoul28 Outline Example scientific application –Combustion Simulation Function Approximation Problem –Formulation –Hardness –Algorithm Indexing problem

VLDB 2006, Seoul29 Indexing Problem Workload –Retrieve: Find ellipsoid containing query point

VLDB 2006, Seoul30 Indexing Problem Workload –Retrieve: Find ellipsoid containing query point –Grow Find ellipsoids to be grown Update grown ellipsoids

VLDB 2006, Seoul31 Indexing Problem Workload –Retrieve: Find ellipsoid containing query point –Grow Find ellipsoids to be grown Update grown ellipsoids –Add: Insert a new ellipsoid

VLDB 2006, Seoul32 New Indexing Problem Shape of regions Updates and queries interleaved Additional costs: ellipsoid maintenance costs Overall aim: Reduce total simulation time Retrieve/grow/add are all optional –Tuning parameters at each step OperationCost Evaluation2000 Addition1200 Grow10 Approximation1 Search1

VLDB 2006, Seoul33 Approach Goal: Understand index selection for function approximation Empirically compare different index structures Develop a cost model –Accounting for all cost components –Qualitatively explain variations in performance of index structures

VLDB 2006, Seoul34 Outline Example scientific application –Combustion simulation Function approximation problem –Formulation –Hardness –Algorithm Indexing problem –Cost structure, tuning parameters and effects –Index structures and experiments

VLDB 2006, Seoul35 Grow Effects C miss = t f + t growsearch + I grow * C grow + (1-I grow )*C add Tuning Parameter: Ellg –Limit on number of ellipsoids examined for growing –No pruning criteria –Affects t growsearch Chance of finding a growable ellipsoid Tuning Parameter: N grown –Number of ellipsoids grown per step –Affects C grow Structure of the index (overlapping ellipsoids)

VLDB 2006, Seoul36 Retrieve Effects C tot = t search + I ret * t la + (1-I ret ) * C miss Tuning Parameter: Ellr –Limit on number of ellipsoids examined during retrieve –Limits how much of the index is searched –Affects t search Chances of a current retrieve and also future retrieves

VLDB 2006, Seoul37 Retrieve Effects C tot = t search + I ret * t la + (1-I ret ) * C miss Number of ellipsoids examined during retrieve ( Ellr ) –Increasing Ellr (+) Increases chances of a retrieve (-) Reduces the benefit of the retrieve ( increases t search ) -Missed retrieves cause grows and adds which affect future retrieves –Index new parts of the domain –Change structure of the index

VLDB 2006, Seoul38 Add Effects C miss = t f + t growsearch + I grow * C grow + (1-I grow )*C add Tuning parameter: Indirectly controlled by retrieves and grows –Affects Should query point be covered by an add or grow? (-) Computing new ellipsoids is expensive (-) New ellipsoids cover smaller part of the domain (+) May lead to better ellipsoid distribution

VLDB 2006, Seoul39 Candidate Index Structures Bounding Box Rtree Point Rtree Ellipsoid Rtree Random Projection Rtree Binary Tree MRU List + Rtree

VLDB 2006, Seoul40 Binary Tree Primary Retrieve A C B 1 2 A B C 2 1 q

VLDB 2006, Seoul41 Binary Tree Secondary Retrieve A C B 1 2 A B C 2 1 q

VLDB 2006, Seoul42 Binary Tree A C B 1 2 A B C 2 1

VLDB 2006, Seoul43 Binary Tree Secondary Retrieve now Primary Retrieve A C B 1 2 A 1 2 3 3 D B D C C

VLDB 2006, Seoul44 Effects In Action: Binary Tree 32 dimensional Methane simulation 6 x 10 6 queries Windows XP machine (2.4 Ghz, 2GB)

VLDB 2006, Seoul45 MRU List + Rtree MRU List for retrieving –High locality Rtree for searching growable ellipsoids MRU List Rtree

VLDB 2006, Seoul46 Effects In Action: MRU List + Rtree Effects very different from Binary Tree

VLDB 2006, Seoul47 Total Simulation Times Index TypeError Tolerance 0.0050.000050.00004 Binary Tree (tuned) 10731018113100 MRU List + Rtree11251400019920 Bbox Rtree12011470020850 Random Projection Rtree 13781580022051 Binary Tree(default) 13442918631200 FIFO List + Rtree21543377042900 Point Rtree10431>44000- Ellipsoidal Rtree14328>44000-

VLDB 2006, Seoul48 Conclusion & Future Work Formulated the function approximation problem New class of applications for high dimensional indexing Understand index selection for function approximation Future work –Dynamic parameter settings –New benchmark for index structures –Evaluation of other index structures –Comparison with other function approximation techniques

VLDB 2006, Seoul49 Questions?

VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

Similar presentations

Presentation on theme: "VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University.

Similar presentations

Presentation on theme: "VLDB 2006, Seoul1 Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University."— Presentation transcript:

Similar presentations

About project

Feedback