System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.

System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information Sciences Ohio State University

Scientific Data Mining Problem Datasets used for scientific data mining are large – particularly from simulations Our understanding of what algorithms and parameters will give desired insights is limited Time required for implementing different algorithms and running them with different parameters on large datasets slows down the scientific data mining process

Project Overview FREERIDE (Framework for Rapid Implementation of datamining engines) as the base system Already demonstrated for a variety of standard mining algorithms Working for feature analysis and mining of simulation data currently

FREERIDE offers:  The ability to rapidly prototype a high- performance mining implementation  Distributed memory parallelization  Shared memory parallelization  Ability to process large and disk-resident datasets  Only modest modifications to a sequential implementation for the above three

Key Observation from Mining Algorithms Popular algorithms have a common canonical loop Can be used as the basis for supporting a common middleware While( ) { forall( data instances d) { I = process(d) R(I) = R(I) op d } ……. }

Performance of Shared Memory Parallelization K-means clustering

Performance on Cluster of SMPs Apriori Association Mining

SPIES On (a) FREERIDE Developed a new communication efficient decision tree construction algorithm – Statistical Pruning of Intervals for Enhanced Scalability (SPIES) Combines RainForest with statistical pruning of intervals of numerical attributes to reduce memory requirements and communication volume Does not require sorting of data, or partitioning and writing-back of records

Broader Research Agenda

Applying FREERIDE for Scientific Data Mining Focusing on feature extraction, tracking, and mining approach developed by Machiraju et al. A feature is a region of interest in a dataset A suite of algorithms for extracting and tracking them

AggregateClassify Points Rank DenoiseTrack Transform OperatorTour Grid A Feature Analysis Algorithm ROIs Data Catalog Classify-Aggregate

Ongoing Work – Parallelization Using FREERIDE Most of the steps involve generalized reductions - supported well in FREERIDE Extensions to FREERIDE required for aggregation and tracking steps Overall, FREERIDE can allow rapid implementation of scalable versions of a variety of steps and algorithms that are part of the feature mining paradigm

System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.

Similar presentations

Presentation on theme: "System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.

Similar presentations

Presentation on theme: "System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information."— Presentation transcript:

Similar presentations

About project

Feedback