Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness Stephan Zednik Patrick West ( ), Peter Arthur Fox ( ) Rensselaer Polytechnic Institute th St., Troy, NY, United States Poster: IN51D-1713 Glossary: RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute VSTO – Virtual Solar Terrestrial Observatory. FITS – Flexible Image Transport System Acknowledgments: Sapan Shah and Naveen Sridhar from the Tetherless World Constellation at RPI Joan Burkepile, Steve Tomczyk and Leonard Sitongia at the High Altitude Observatory. Sponsors: National Science Foundation Motivations and Challenges: Analysis of solar data necessary for space weather modeling and forecasting – which have broad implications for terrestrial activity (e.g., communication grid reliability). Time series visualizations of solar activity, created by the High Altitude Observatory [1], enable needed analyses. This work focuses on two challenges: Only small sections of the data will typically contain content of interest to scientists Subsets of time-series data may correspond to an event of interest at a particular time (e.g., a solar event) Based on these challenges, one goal in this work was to enable scientists to get back data sets corresponding to desired data products - to facilitate further analysis. Data Management Strategies: Provenance records for individual visualizations. Ontological classification of visualizations, using DQ and STOM Encoding records in RDF Datacube [2] (proposed) IN51D-1713 Next Steps Mauna Loa Solar Observatory (MLSO) Hawaii Intensity Visualizations Raw Image Data Captured National Center for Atmospheric Research (NCAR) Data Center. Boulder, CO Follow-up Processing on Raw Data Publishes Time-stamped Observation Logs, maintained by MLSO staff. Comments on: Weather + Instrument conditions Case Study: Coronal Multi-channel Polarimeter (CoMP): Semantic Visualization Provenance Records: What Datacube Is: An RDF vocabulary for expressing multidimensional data. Is designed for categorizing data points, and enabling data aggregations. Properties attached to datasets/slices/observati ons: Dimensions: Year, Metric Attributes: GBU Metric Measures: 146 (the value) DataCube Usage: For HAO visualization records, Datacube can be used in two ways: -Returning aggregations of statistics for images (e.g., GBU results). - Returning sets of visualizations (data points) for further exploration, based on constraints (e.g., temporal range). Use Cases: - Activity Log Usage: Return images corresponding to a specific solar event record. - Provenance (utilized data product): For this set of images utilizing the following flat field configuration file. - Provenance (utilized process): For this set of images running based on version 2.0 of process “Extract Intensity”. - Observer Log Usage: For the following observer log comment, return visualizations within 2 hours of the comment timestamp. Time-stamped Activity Logs, maintained by MLSO staff. Comments on solar events (Coronal Mass Ejections, Active Regions) - Deployment of provenance record retrieval as part of Virtual Solar Terrestrial Observatory. - Semantic Encoding of MLSO Event Logs - or data from Lockheed Martin's Heliophysics Events Knowledge Base [3]. - Expanded use of dimensions in data cube, to include FITS header data. References: [1] Mauna Loa Solar Observatory (High Altitude Observatory Site): [2] RDF Datacube Vocabulary: [3] Heliophysics Event Knowledge Base: