High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University

Motivation Languages, compilers, and runtime systems for high- end computing Typically focus on scientific applications Can commercial applications benefit ? A majority of top 500 parallel configurations are used as database servers Is there a role for parallel systems research ? Parallel relational databases – probably not Data mining, OLAP, decision support – quite likely

Data Mining Extracting useful models or patterns from large datasets Includes a variety of tasks - mining associations, sequences, clustering data, building decision trees, predictive models - several algorithms proposed for each Both compute and data intensive Algorithms are well suited for parallel execution High-level interfaces can be useful for application development

Project Overview

Project Components A middleware system called FREERIDE (Framework for Rapid Implementation of Datamining Engines) (SDM 01, SDM 02) Performance modeling and prediction (for parallelization strategy selection) SIGMETRICS 2002 Data parallel compilation (under submission) Translation from mining operators (not yet ) Focus on design and evaluation of the interface for shared memory parallelization in this paper

Outline Key observation from mining algorithms Parallelization challenge, techniques and trade-offs Programming Interface Experimental Results K- means Apriori Summary and future work

Common Processing Structure Structure of Common Data Mining Algorithms {* Outer Sequential Loop *} While () { { * Reduction Loop* } Foreach (element e) { (i,val) = process(e); Reduc(i) = Reduc(i) op val; } Applies to major association mining, clustering and decision tree construction algorithms How to parallelize it on a shared memory machine?

Challenges in Parallelization Statically partitioning the reduction object to avoid race conditions is generally impossible. Runtime preprocessing or scheduling also cannot be applied Can’t tell what you need to update w/o processing the element The size of reduction object means significant memory overheads for replication Locking and synchronization costs could be significant because of the fine-grained updates to the reduction object.

Parallelization Techniques Full Replication: create a copy of the reduction object for each thread Full Locking: associate a lock with each element Optimized Full Locking: put the element and corresponding lock on the same cache block Fixed Locking: use a fixed number of locks Cache Sensitive Locking: one lock for all elements in a cache block

Memory Layout for Various Locking Schemes Full Locking Fixed Locking Optimized Full LockingCache-Sensitive Locking LockReduction Element

Programming Interface: k-means example Initialization Function void Kmeans::initialize() { for (int i=0;i<k;i++) { clusterID[I]=reducobject->alloc(ndim+2); } {* Initialize Centers *} }

k-means example (contd.) Local Reduction Function void Kmeans::reduction(void *point) { for (int I=0;I<k;I++) { dis=distance(point,I); if (dis<min) { min=dis; min_index=I; } objectID=clusterID[min_index]; for (int j=0;j<ndim;j++) reductionobject->Add(objectID,j,point[j]); reduction object->Add(objectID,ndim,1); reductionobject->Add(objectID,ndim+1,dis); }

Implementation from the Common Specification Template inline void Reducible ::Reduc(int objectID, int Offset, void (*func)(void *,void*), int *param) { T* group_address=reducgroup[ObjectID]; switch (TECHNIQUE) { case FULL_REPLICATION: func(group_address[Offset],param); break; case FULL_LOCKING: offset=abs_offset(ObjectID,Offset); S_LOCK(&locks[offset]); func(group_address[Offset],param); S_UNLOCK(&locks[offset]); break; case OPTIMIZED_FULL_LOCKS: S_LOCK(& group_address[Offset*2]); func(group_address[Offset*2+1],param); S_UNLOCK(& group_address[Offset*2]); break; }

Experimental Platform Small SMP machine Sun Ultra Enterprise 450 4 X 250 MHz Ultra-II processors 1 GB of 4-way interleaved main memory Large SMP machine Sun Fire 6800 24 X 900 MHz Sun UltraSparc III A 96KB L1 cache and a 64 MB L2 cache per processor 24 GB main memory

Results Scalability and Middleware Overhead for Apriori: 4 Processor SMP Machine

Results Scalability and Middleware Overhead for Apriori: Large SMP Machine

Results Scalability and Middleware Overhead for K-means: 4 Process SMP Machine 200MB dataset, k=1000

Results Scalability and Middleware Overhead for K-means: Large SMP Machine

Compiler Support Use a data parallel dialect of Java Well suited for expressing common mining algorithms Main computational loops are data parallel Use the notion of reduction interface to implement reduction objects Our compiler generates middleware code

Experimental Evaluation Currently limited to distributed memory parallelization

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

Similar presentations

Presentation on theme: "High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

Similar presentations

Presentation on theme: "High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University."— Presentation transcript:

Similar presentations

About project

Feedback