Science Problem: Cognitive capacity (human/scientist understanding), storage and I/O have not kept up with our capacity to generate massive amounts physics-based.

Science Problem: Cognitive capacity (human/scientist understanding), storage and I/O have not kept up with our capacity to generate massive amounts physics-based simulation data. Data must be reduced, but which data do we keep or look at? Technical Solution: Apply automatic data selection operations using various metrics and criteria to emulate the exploration during physics simulations (in situ) and make the data bandwidth use more “information dense.” Physics simulations are reduced to a representative set based on the selected metric that is used. Science Impact: Scientist and analyst time is more efficient and bandwidth (cognitive, storage and I/O) is “richer,” as more scientifically-relevant simulation information is packed into it. Woodring, Myers, Nouanesengsy, Wendelberger, Patchett, Fasel, Ahrens Automatic Data Selection for In Situ Analysis discovery events flow control (lightweight) (heavyweight) detectors triggers dynamic computational raw data products data

Combining Statistics, Sampling, and Indexing to Quantitatively Scale Massive Scientific Data Science Problem: Climate and cosmological analysis and exploration via queries on their massive data sets may still result in a massive amount of data for the scientist to comb through and/or transfer. Technical Solution: Combine bitmap indexing with stratified random sampling (stratified by bins, random within bins) with precalculated errors to quickly and quantitatively scale data. Science Impact: Climate and cosmological scientists are able to interactively and automatically scale data queries down to smaller samples with automatic error estimates, accelerating their scientific analysis workflow. Su, Agrawal, Woodring, Myers, Wendelberger, and Ahrens. “Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices.” To appear in HPDC 2013.

Science Problem: Storage and I/O have not kept up with climate and cosmological simulation capacity to generate data. Data must be reduced, but what has been lost in the process of reduction? Technical Solution: After every reducing transformation (T), assess and record the residuals/errors via compact measures (M) comparing reduced (a) to “original” data (A) by reconstructed data (B). Science Impact: Measuring the data uncertainty via a quantitative quality provenance shows that climate and cosmological scientists are able to use reduced data for scientific analysis. Woodring, Shafii, Biswas, Myers, Wendelberger, Hamann, and Shen. “Metrics and Workflow for Quantifying the Quality of Reduction Transformations on Large-Scale, Scientific Scalar Data.” Submitted to LDAV 2013. Metrics and Workflow for Quantifying the Quality of Reduction Transformations on Large-Scale, Scientific Scalar Data

Expected ImpactMilestones and Status Novel Ideas Principal Investigator: James Ahrens et al., LANL Sept. 25, 2013 In our first year, we have developed several data selection algorithms, designed a prototype visualization and analysis system that utilizes selected data, and quantified the effects on selecting scientific data. MilestoneExpectedActual Data Selection Algorithms03/1303/13 Data Product Explorer09/1309/13 Selection Quality Measurements09/1309/13 Advanced Selection Algorithms03/14 Perceptually-Driven Presentation03/14 Selection in Data Product Explorer09/14 Domain-Driven Selection03/15 Bandwidth Utilization Adjustment09/15 Advanced Presentation Methods09/15 Save compute time as only selected data are stored with limited I/O bandwidth and capacity Not all data can be saved from an exascale simulation, therefore we must be prescriptive on what data are saved We will provide data selection algorithms Save analysis time as selected data are presented to the scientist The resulting data will still be massive, more than any one scientist can look at Interaction, query, and highlighting methods drive scientists to view key data Perception, cognitive capacity, and computer bandwidth are limited, but the scale of data continues to increase. In order to save both compute cycles and analyst time, we explicitly select and present data to the scientist: Time, space, and variable selection algorithms Store and index selected data products Interactive presentation and query methods Illustratively and artistically highlight to perceptually drive scientists to key data IMD Exploration of Exascale In Situ Visualization and Analysis Approaches Exascale and “Big Data” running simulations, running experiments, static repositories, etc. Exascale and “Big Data” running simulations, running experiments, static repositories, etc. Data Selection Algorithms (time, space, variable, product, etc.) Analysis with Perceptually-Driven Highlighting raw image geo- metry … …

Science Problem: Cognitive capacity (human/scientist understanding), storage and I/O have not kept up with our capacity to generate massive amounts physics-based.

Similar presentations

Presentation on theme: "Science Problem: Cognitive capacity (human/scientist understanding), storage and I/O have not kept up with our capacity to generate massive amounts physics-based."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Science Problem: Cognitive capacity (human/scientist understanding), storage and I/O have not kept up with our capacity to generate massive amounts physics-based.

Similar presentations

Presentation on theme: "Science Problem: Cognitive capacity (human/scientist understanding), storage and I/O have not kept up with our capacity to generate massive amounts physics-based."— Presentation transcript:

Similar presentations

About project

Feedback