Presentation is loading. Please wait.

Presentation is loading. Please wait.

System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.

Similar presentations


Presentation on theme: "System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information."— Presentation transcript:

1 System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information Sciences Ohio State University

2 Scientific Data Mining Problem Datasets used for scientific data mining are large – particularly from simulations Our understanding of what algorithms and parameters will give desired insights is limited Time required for implementing different algorithms and running them with different parameters on large datasets slows down the scientific data mining process

3 Project Overview FREERIDE (Framework for Rapid Implementation of datamining engines) as the base system Already demonstrated for a variety of standard mining algorithms Working for feature analysis and mining of simulation data currently

4 FREERIDE offers:  The ability to rapidly prototype a high- performance mining implementation  Distributed memory parallelization  Shared memory parallelization  Ability to process large and disk-resident datasets  Only modest modifications to a sequential implementation for the above three

5 Key Observation from Mining Algorithms Popular algorithms have a common canonical loop Can be used as the basis for supporting a common middleware While( ) { forall( data instances d) { I = process(d) R(I) = R(I) op d } ……. }

6 Performance of Shared Memory Parallelization K-means clustering

7 Performance on Cluster of SMPs Apriori Association Mining

8 SPIES On (a) FREERIDE Developed a new communication efficient decision tree construction algorithm – Statistical Pruning of Intervals for Enhanced Scalability (SPIES) Combines RainForest with statistical pruning of intervals of numerical attributes to reduce memory requirements and communication volume Does not require sorting of data, or partitioning and writing-back of records

9 Broader Research Agenda

10 Applying FREERIDE for Scientific Data Mining Focusing on feature extraction, tracking, and mining approach developed by Machiraju et al. A feature is a region of interest in a dataset A suite of algorithms for extracting and tracking them

11 AggregateClassify Points Rank DenoiseTrack Transform OperatorTour Grid A Feature Analysis Algorithm ROIs Data Catalog Classify-Aggregate

12 Ongoing Work – Parallelization Using FREERIDE Most of the steps involve generalized reductions - supported well in FREERIDE Extensions to FREERIDE required for aggregation and tracking steps Overall, FREERIDE can allow rapid implementation of scalable versions of a variety of steps and algorithms that are part of the feature mining paradigm


Download ppt "System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information."

Similar presentations


Ads by Google