Lionel F. Lovett, II Jackson State University Research Alliance in Math and Science Computer Science and Mathematics Division Mentors: George Ostrouchov.

Slides:



Advertisements
Similar presentations
Attack Graphs for Proactive Digital Forensics Tara L. McQueen Delaware State University Louis P. Wilder Computational Sciences and Engineering Division.
Advertisements

I would like to thank Louis P. Wilder and Dr. Joseph Trien for the opportunity to work on this project and for their continued support. The Research Alliance.
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
First Lego League of Tennessee Quentoria Leeks Fisk University Research Alliance in Math and Science Computer Applications and Web Technologies Networking.
Two Technique Papers on High Dimensionality Allan Rempel December 5, 2005.
Managed by UT-Battelle for the Department of Energy 1 Mathematical Modeling of Fatty Acid Oxidation in Skeletal Muscle Cells Sheds New Light on Obesity.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Principal Component Analysis
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Dimensionality Reduction
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Cluster Computing Applications Project: Parallelizing BLAST The field of Bioinformatics needs faster string matching algorithms. What Exactly is BLAST?
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
Dimensionality Reduction
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Cluster Computing Applications Project Parallelizing BLAST Research Alliance of Minorities.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE NEURAL NETWORKS RESEACH CENTRE Variability of Independent Components.
The Evaluation of an Embedded System for First Responders Nicholas Brabson The University of Tennessee David Hill Computational Sciences and Engineering.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
Weigh-in-Motion User Manual For WIM Integrated System Cindy Lopez City University of New York – York College Research Alliance in Math and Science Computational.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Methods  OpenGL Functionality Visualization Tool Functionality 1)3D Shape/Adding Color1)Atom/element representations 2)Blending/Rotation 2)Rotation 3)Sphere.
Tiffany M. Marshall Saint Mary-of-the-Woods College Mentors : Tim McKnight Measurement Science and Systems.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Nanoscale Electronics / Single-Electron Transport in Quantum Dot Arrays Dene Farrell SUNY.
Integrating Visualization Peripherals into Power-Walls and Similar Tiled Display Environments James Da Cunha Savannah State University Research Alliance.
The Effects of Radio Propagation in the Workplace Carolyn Jo Shields Research Alliance in Math and Science Information Technology Services Division, Oak.
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Lionel F.
Basic concepts in ordination
United States Grid Security and Reliability Control in High Load Conditions Christopher Lanclos—Mississippi Valley State University Research Alliance in.
OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Parallel Solution of 2-D Heat Equation Using Laplace Finite Difference Presented by Valerie Spencer.
POSTER TEMPLATES BY: Meta data - data that provides information about data.Meta data - data that provides information about.
Introduction Relationship between climate and health widely studied Climatic temperature stress increases cardiovascular disease risk Solar UV radiation.
Managed by UT-Battelle for the Department of Energy 1 Advanced Brain-Wave Analysis For Early Diagnosis of Alzheimer’s Disease (AD) Presented by Jaron Murphy.
Managed by UT-Battelle for the Department of Energy 1 Integrated Catalogue (ICAT) Auto Update System Presented by Jessica Feng Research Alliance in Math.
Presented by ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation George Ostrouchov Statistics and.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY A Comparison of Methods for Aligning Genomic Sequences Ja’Nera Mitchom Fisk University Research.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Computer Science Research and Development Department Computing Sciences Directorate, L B N L 1 Storage Management and Data Mining in High Energy Physics.
Parametric Study of Mechanical Stress in Abdominal Aortic Aneurysms (AAA) Erin A. Lennartz Virginia Polytechnic Institute and State University Research.
Managed by UT-Battelle for the Department of Energy Flux Coupling Machines and Switched Reluctance Motors to Replace Permanent Magnets in Electric Vehicles.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Computational Sciences & Engineering Division Geographic Information Science and Technology Landsat LIDAR data Hi-res satellite imagery sensor networks.
METHODS CT scans were segmented and triangular surface meshes generated using Amira. Antiga and Steinman’s method (2004) for automatically extracting parameterized.
Hormone Replacement Therapy: Friend or Foe? A Retrospective Study for Prospective Research Research Alliance in Math and Science Computational Sciences.
The Research Alliance in Math and Science program is sponsored by the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department.
CCSM3 / HadCM3 Under predict precipitation rate near equator regions CCSM3 under predicts greater in SE U.S. than HadCM3 Methodology and Results Interpolate.
Advanced Brain-Wave Analysis For Early Diagnosis of Alzheimer’s Disease (AD) Jaron Murphy The Ohio State University Research Alliance in Math and Science.
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
Managed by UT-Battelle for the Department of Energy 1 Decreasing the Artificial Attenuation of the RCSIM Radio Channel Simulation Software Abigail Snyder.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
Parallelization of a Non-Linear Analysis Code Lee Hively and Jim Nutaro (mentors) Computational Sciences and Engineering Travis Whitlow Research Alliance.
Managed by UT-Battelle for the Department of Energy 1 United States Grid Security and Reliability Control in High Load Conditions Presented to Associate.
Source Localization in a Moving Sensor Field Acknowledgements A special thanks to my mentor Dr. Jacob Barhen for his assistance through the duration of.
Regression Testing for CHIMERA Jessica Travierso Austin Peay State University Bronson Messer National Center for Computational Sciences August 2009.
Regression Testing for CHIMERA Jessica Travierso Austin Peay State University Research Alliance in Math and Science National Center for Computational Sciences,
Book web site:
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Data Science introduction.
Dimension reduction : PCA and Clustering
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Presentation transcript:

Lionel F. Lovett, II Jackson State University Research Alliance in Math and Science Computer Science and Mathematics Division Mentors: George Ostrouchov and Houssain Kettani RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Abstract Databases can be very large due to the number of items and due to the number of attributes (high-dimensionality) associated with each item. Clustering reduces the number of items to their representative clusters and dimension reduction reduces the number of attributes. In addition, visualization of high-dimensional data requires reduction to lower-dimensional views that are often displayed as two or three dimensional plots. Traditional dimension reduction algorithms such as the singular value decomposition based principal components are computationally demanding and can be very slow. As the size of databases continues to grow, so does the demand for faster methods to visualize the data. RobustMap is a new, fast and robust dimension reduction method for high-dimensional datasets, which can separate outlying clusters from the main body of the data while computing a low-dimensional representation. It relies on stochastic concepts and on statistical distance distributions. The algorithm considers distance distributions from random and from extreme points to determine projection axes and clusters for dimension reduction. In determining the clusters, RobustMap focuses on the largest cluster, excluding outlying clusters. The visualization applications of this algorithm may be implemented in a range of disciplines, which include: medical databases, images, time series, music, and data mining. Project Goals/Tasks The focus of this project was to develop a fast and robust dimension reduction method for large, high- dimensional data sets. The project improves on an algorithm that is set up for indexing, data-mining and creating visualizations of traditional and multimedia datasets. The Research Alliance in Math and Science program is sponsored by the Mathematical, Information, and Computational Sciences Division, Office of Advanced Scientific Computing Research, U.S. Department of Energy. The work was performed at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. De-AC05-00OR This work has been authored by a contractor of the U.S. Government, accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes. OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY Results RobustMap correctly extracts the largest cluster. RobustMap performs dimension reduction. RobustMap maps data to k dimensions in O(nk) time. RobustMap exploits robust statistics Special thanks to Robert M. Day Applications Similarity searching in string databases, as in the case of spelling, typing and OCR error correction. Medical databases, where 1-d objects (eg., ECGs), 2-d images (eg., X-rays) and 3-d images (eg., MRI brain scans) are stored. Time series, with, eg. financial data, such as stock prices, or scientific databases. Data mining, and visualization applications. Future Research Inside the ratio function the threshold will be computed based on probability and data. A loop to reduce dimensionality for remaining clusters will be written. Additional theory for RobustMap will be developed. Given distances along sasb RobustMap computes distances within the orthogonal hyperplane. Orthogonal Projection Dimension Reduction All visualizations require low dimensional views, 2-D or 3-D. Through visualization, many structures (e.g., patterns and clusters), that were previously unknown, are discovered. Multimedia database searching requires fast algorithms. Reducing to a lower dimensions makes similarity searching faster. RobustMap’s Processes 1. Compute n distances from the first object Take point of largest distance Repeat 2. Plots and Clusters Create diagnostic histograms for distances Estimate probability density of distances Find ratio of actual to expected distances Exclude high ratio objects as outlying clusters 3. Finish projection using only remaining objects RobustMap Algorithm RobustMap is based on FastMap and statistical properties of distance distributions. FastMap is too sensitive to outliers. Using robust statistics, RobustMap extracts the largest cluster from a dataset, while identifying outlying clusters and reducing dimensionality. Dotted lines represent FastMap's pivot pairs and axis. Bold lines represent RobustMap's pivot pairs and axis. *Notice that RobustMap distributes the points more evenly along its axis.