Review Given a training space T(A1,…,An, C) and its features subspace X(A1,…,An) = T[A1,…,An], a functional f:X Reals, distance d(x,y)  |f(x)-f(y)| and.

Review Given a training space T(A1,…,An, C) and its features subspace X(A1,…,An) = T[A1,…,An], a functional f:X Reals, distance d(x,y)  |f(x)-f(y)| and aDom(X), we define: A s k immediate neighborhood of a, skin(a,k) is : skin(a,k){a}=, |skin(a,k)| = k, and d(x,a)  d(y,a), xskin(a), yskin(a,k){a}, which give us a set of k nearest neighbors of a. The closed s k immediate neighborhood of a, cskin(a,k) =  skin(a,k)'s skin(a,r) is : skin(a,r){a}=, d(x,a)  r xskin(a,r) The closed skin(a,r) is the skin(a,r)'s In SMART-TV algorithm, the f is TVX. The skin(a.k) are searched linearly and maintained in a heap structure. The problem with a linear search to find the skin(a.k) is that when the dataset is very large, it may not be scaled.

Improvement The improvement done to SMART-TV:
P-trees on derived attribute TVs are created on the fly right after the computation of TV(X,x), xX. The P-tree mask of cskin(a,k) or cskin(a,r) in the contour(TV,cskin(a,r)) are identified quickly using equal interval neighborhood strategy. Select the candidates in the resulting mask that are d(x,a) < threshold or the closest k and let those selected near neighbors vote with RDF weighted votes.

Log of Total Variations
In large dataset, TV values can be very big number. Could we use log function on TV to reduce the values? We know that log function can be used to transform large Real numbers to small Real numbers. Ex. Log2( ) = The transformation ease the choice of  for contour(TV,skin(a, )). TV LOG2(TV)

Results Using RSI Datasets
KNN Loading Time (Seconds) 32M 553 64M 1024 96M 1536 Dataset SMART-TV w/ Ptree SMART-TV w/ Scan KNN 32M 5.77 35.12 296.90 64M 11.66 70.30 593.71 96M 17.42 106.76 891.58 Note: Datasets for KNN were loaded partially since they could not fit entirely in the memory. Observed on Midas-15 P4 3.8 RAM machine K=5 and  = 0.005

Preprocessing Dataset (Time in Seconds) All COUNTs A single COUNT TVs
28.73 0.09 13,440 64M 57.55 0.18 27,072 96M 87.05 0.27 40,608 Observed on Midas-15 P4 3.8 RAM machine The complexity to get the COUNTs: O(db2), where d is the number of dimensions and b is the average bit width. For RSI datasets, d = 5 and b = 8 Average time to compute TV for these RSI datasets is seconds

Review Given a training space T(A1,…,An, C) and its features subspace X(A1,…,An) = T[A1,…,An], a functional f:X Reals, distance d(x,y)  |f(x)-f(y)| and.

Similar presentations

Presentation on theme: "Review Given a training space T(A1,…,An, C) and its features subspace X(A1,…,An) = T[A1,…,An], a functional f:X Reals, distance d(x,y)  |f(x)-f(y)| and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Review Given a training space T(A1,…,An, C) and its features subspace X(A1,…,An) = T[A1,…,An], a functional f:X Reals, distance d(x,y)  |f(x)-f(y)| and.

Similar presentations

Presentation on theme: "Review Given a training space T(A1,…,An, C) and its features subspace X(A1,…,An) = T[A1,…,An], a functional f:X Reals, distance d(x,y)  |f(x)-f(y)| and."— Presentation transcript:

Similar presentations

About project

Feedback