Informatics takes on data and information quality, uncertainty and bias (in atmospheric science) Peter Fox (TWC/RPI), and … Stephan Zednik 1, Gregory Leptoukh.

Slides:



Advertisements
Similar presentations
VIIRS LST Uncertainty Estimation And Quality Assessment of Suomi NPP VIIRS Land Surface Temperature Product 1 CICS, University of Maryland, College Park;
Advertisements

Experiences Developing a User-centric Presentation of Provenance for a Web- based Science Data Analysis Tool Stephan Zednik 1, Gregory Leptoukh 2, Peter.
Gregory Leptoukh, David Lary, Suhung Shen, Christopher Lynnes What’s in a day?
Data Quality Screening Service Christopher Lynnes, Bruce Vollmer, Richard Strub, Thomas Hearty Goddard Earth Sciences Data and Information Sciences Center.
Quantifying aerosol direct radiative effect with MISR observations Yang Chen, Qinbin Li, Ralph Kahn Jet Propulsion Laboratory California Institute of Technology,
Data Quality Screening Service Christopher Lynnes, Richard Strub, Thomas Hearty, Bruce Vollmer Goddard Earth Sciences Data and Information Sciences Center.
1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH Week 11, April 26, 2011 Information integration, life- cycle and visualization.
ESTEC July 2000 Estimation of Aerosol Properties from CHRIS-PROBA Data Jeff Settle Environmental Systems Science Centre University of Reading.
MODIS Regional and Global Cloud Variability Brent C. Maddux 1,2 Steve Platnick 3, Steven A. Ackerman 1,2, Paul Menzel 1, Kathy Strabala 1, Richard Frey.
Satellite Remote Sensing of Surface Air Quality
Experiences Developing a User- centric Presentation of A Domain- enhanced Provenance Data Model Cynthia Chang 1, Stephan Zednik 1, Chris Lynnes 2, Peter.
DROUGHT MONITORING THROUGH THE USE OF MODIS SATELLITE Amy Anderson, Curt Johnson, Dave Prevedel, & Russ Reading.
Visible Satellite Imagery Spring 2015 ARSET - AQ Applied Remote Sensing Education and Training – Air Quality A project of NASA Applied Sciences Week –
1 Satellite Remote Sensing of Particulate Matter Air Quality ARSET Applied Remote Sensing Education and Training A project of NASA Applied Sciences Pawan.
Determining the accuracy of MODIS Sea- Surface Temperatures – an Essential Climate Variable Peter J. Minnett & Robert H. Evans Meteorology and Physical.
Visualization, Exploration, and Model Comparison of NASA Air Quality Remote Sensing data via Giovanni Ana I. Prados, Gregory Leptoukh, Arun Gopalan, and.
1 Satellite Remote Sensing of Particulate Matter Air Quality ARSET Applied Remote Sensing Education and Training A project of NASA Applied Sciences Pawan.
Application of Satellite Data to Particulate, Smoke and Dust Monitoring Spring 2015 ARSET - AQ Applied Remote Sensing Education and Training – Air Quality.
Experiences Developing a Semantic Representation of Product Quality, Bias, and Uncertainty for a Satellite Data Product Patrick West 1, Gregory Leptoukh.
AeroStat: Online Platform for the Statistical Intercomparison of Aerosols Gregory Leptoukh, NASA/GSFC (P.I.) Christopher Lynnes, NASA/GSFC (Co-I.) Robert.
VALIDATION OF SUOMI NPP/VIIRS OPERATIONAL AEROSOL PRODUCTS THROUGH MULTI-SENSOR INTERCOMPARISONS Huang, J. I. Laszlo, S. Kondragunta,
Addressing and Presenting Quality of Satellite Data Gregory Leptoukh ESIP Information Quality Cluster.
Developing a High Spatial Resolution Aerosol Optical Depth Product Using MODIS Data to Evaluate Aerosol During Large Wildfire Events STI-5701 Jennifer.
Giovanni for AQ Gregory Leptoukh NASA Goddard Space Flight Center Goddard Earth Sciences Data and Information Services Center (GES DISC)
Occurrence of TOMS V7 Level-2 Ozone Anomalies over Cloudy Areas Xiong Liu, 1 Mike Newchurch, 1,2 and Jae Kim 1,3 1. Department of Atmospheric Science,
Orbit Characteristics and View Angle Effects on the Global Cloud Field
Summer Institute in Earth Sciences 2009 Comparison of GEOS-5 Model to MPLNET Aerosol Data Bryon J. Baumstarck Departments of Physics, Computer Science,
1 Satellite Remote Sensing of Particulate Matter Air Quality ARSET Applied Remote Sensing Education and Training A project of NASA Applied Sciences Pawan.
IAAR Seminar 21 May 2013 AOD trends over megacities based on space monitoring using MODIS and MISR Pinhas Alpert 1,2, Olga Shvainshtein 1 and Pavel Kishcha.
Quality, Uncertainty and Bias Representations of Atmospheric Remote Sensing Information Products Peter Fox, and … others Xinformatics 4400/6400 Week 11,
GOES and GOES-R ABI Aerosol Optical Depth (AOD) Validation Shobha Kondragunta and Istvan Laszlo (NOAA/NESDIS/STAR), Chuanyu Xu (IMSG), Pubu Ciren (Riverside.
MAPSS and AeroStat: integrated analysis of aerosol measurements using Level 2 Data and Point Data in Giovanni Maksym Petrenko Charles Ichoku (with the.
Applications of Satellite Remote Sensing to Estimate Global Ambient Fine Particulate Matter Concentrations Randall Martin, Dalhousie and Harvard-Smithsonian.
1 of 26 Characterization of Atmospheric Aerosols using Integrated Multi-Sensor Earth Observations Presented by Ratish Menon (Roll Number ) PhD.
Satellite observations of AOD and fires for Air Quality applications Edward Hyer Naval Research Laboratory AQAST June, Madison, Wisconsin 15 June.
DEVELOPING HIGH RESOLUTION AOD IMAGING COMPATIBLE WITH WEATHER FORECAST MODEL OUTPUTS FOR PM2.5 ESTIMATION Daniel Vidal, Lina Cordero, Dr. Barry Gross.
NASA and Earth Science Applied Sciences Program
NASA Ocean Color Research Team Meeting, Silver Spring, Maryland 5-7 May 2014 II. Objectives Establish a high-quality long-term observational time series.
Center for Satellite Applications and Research (STAR) Review 09 – 11 March 2010 Image: MODIS Land Group, NASA GSFC March 2000 Infrared Temperature and.
The Second TEMPO Science Team Meeting Physical Basis of the Near-UV Aerosol Algorithm Omar Torres NASA Goddard Space Flight Center Atmospheric Chemistry.
1 Satellite Remote Sensing of Particulate Matter Air Quality ARSET Applied Remote SEnsing Training A project of NASA Applied Sciences Pawan Gupta Satellite.
Characterization of Aerosols using Airborne Lidar, MODIS, and GOCART Data during the TRACE-P (2001) Mission Rich Ferrare 1, Ed Browell 1, Syed Ismail 1,
Climate data past and future: Can we more effectively monitor and understand our changing climate? Peter Thorne.
14 ARM Science Team Meeting, Albuquerque, NM, March 21-26, 2004 Canada Centre for Remote Sensing - Centre canadien de télédétection Geomatics Canada Natural.
QA filtering of individual pixels to enable a more accurate validation of aerosol products Maksym Petrenko Presented at MODIS Collection 7 and beyond Retreat.
UV Aerosol Product Status and Outlook Omar Torres and Changwoo Ahn OMI Science Team Meeting Outline -Status -Product Assessment OMI-MODIS Comparison OMI-Aeronet.
Provenance in Earth Science Gregory Leptoukh NASA GSFC.
Fog- and cloud-induced aerosol modification observed by the Aerosol Robotic Network (AERONET) Thomas F. Eck (Code 618 NASA GSFC) and Brent N. Holben (Code.
Visualization and workflows Gregory Leptoukh & Christopher Lynnes NASA GSFC.
As components of the GOES-R ABI Air Quality products, a multi-channel algorithm similar to MODIS/VIIRS for NOAA’s next generation geostationary satellite.
Satellite Aerosol Validation Pawan Gupta NASA ARSET- AQ – GEPD & SESARM, Atlanta, GA September 1-3, 2015.
Ambiguity of Quality in Remote Sensing Data Christopher Lynnes, NASA/GSFC Greg Leptoukh, NASA/GSFC Funded by : NASA’s Advancing Collaborative Connections.
Synergy of MODIS Deep Blue and Operational Aerosol Products with MISR and SeaWiFS N. Christina Hsu and S.-C. Tsay, M. D. King, M.-J. Jeong NASA Goddard.
1 Peter Fox Xinformatics Week 10, April 2, 2012 Worked example.
Experiences Developing a Semantic Representation of Product Quality, Bias, and Uncertainty for a Satellite Data Product Patrick West 1, Gregory Leptoukh.
Uncertainty in aerosol retrievals: interaction with the community Adam Povey 1, Thomas Holzer-Popp 2, Gareth Thomas 3, Don Grainger 1, Gerrit de Leeuw.
April 29, 2000, Day 120 July 18, 2000, Day 200October 16, 2000, Day 290 Results – Seasonal surface reflectance, Eastern US.
Integrating satellite fire and aerosol data into air quality models: recent improvements and ongoing challenges Edward Hyer Naval Research Laboratory 6.
Jetstream 31 (J31) in INTEX-B/MILAGRO. Campaign Context: In March 2006, INTEX-B/MILAGRO studied pollution from Mexico City and regional biomass burning,
GEWEX Aerosol Assessment Panel members Sundar Christopher, Rich Ferrare, Paul Ginoux, Stefan Kinne, Jeff Reid, Paul Stackhouse Program Lead : Hal Maring,
Rationale for a Global Geostationary Fire Product by the Global Change Research Community Ivan Csiszar - UMd Chris Justice - UMd Louis Giglio –UMd, NASA,
Satellite Aerosol Comparative Analysis using the Multi-Sensor MAPSS and AeroStat, powered by Giovanni Presented at the Goddard Annual Aerosol Update, at.
AIRS Land Surface Temperature and Emissivity Validation Bob Knuteson Hank Revercomb, Dave Tobin, Ken Vinson, Chia Lee University of Wisconsin-Madison Space.
MODIS Atmosphere Products: The Importance of Record Quality and Length in Quantifying Trends and Correlations S. Platnick 1, N. Amarasinghe 1,2, P. Hubanks.
number Typical aerosol size distribution area volume
MODIS Atmosphere Group Summary Summary of modifications and enhancements in collection 5 Summary of modifications and enhancements in collection 5 Impacts.
Fourth TEMPO Science Team Meeting
Vicarious calibration by liquid cloud target
Evaluating Remote Sensing Data
Using dynamic aerosol optical properties from a chemical transport model (CTM) to retrieve aerosol optical depths from MODIS reflectances over land Fall.
Presentation transcript:

Informatics takes on data and information quality, uncertainty and bias (in atmospheric science) Peter Fox (TWC/RPI), and … Stephan Zednik 1, Gregory Leptoukh 2, Chris Lynnes 2, Jianfu Pan 3 1. Tetherless World Constellation, Rensselaer Polytechnic Inst. 2. NASA Goddard Space Flight Center, Greenbelt, MD, United States 3. Adnet Systems, Inc. 1

Where are we in respect to this data challenge? “The user cannot find the data; If he can find it, cannot access it; If he can access it, ; he doesn't know how good they are; if he finds them good, he can not merge them with other data” The Users View of IT, NAS

Definitions (ATM) Quality –Is in the eyes of the beholder – worst case scenario… or a good challenge Uncertainty –has aspects of accuracy (how accurately the real world situation is assessed, it also includes bias) and precision (down to how many digits) Bias has two aspects: –Systematic error resulting in the distortion of measurement data caused by prejudice or faulty measurement technique –A vested interest, or strongly held paradigm or condition that may skew the results of sampling, measuring, or reporting the findings of a quality assessment: Psychological: for example, when data providers audit their own data, they usually have a bias to overstate its quality. Sampling: Sampling procedures that result in a sample that is not truly representative of the population sampled. (Larry English) 3

Data quality needs: fitness for purpose/ use Measuring Climate Change: –Model validation: gridded contiguous data with uncertainties –Long-term time series: bias assessment is the must, especially sensor degradation, orbit and spatial sampling change Studying phenomena using multi-sensor data: –Cross-sensor bias characterization is needed Realizing Societal Benefits through Applications: –Near-Real Time for transport/event monitoring - in some cases, coverage and timeliness might be more important that accuracy –Pollution monitoring (e.g., air quality exceedance levels) – accuracy Educational (users generally not well-versed in the intricacies of quality; just taking all the data as usable can impair educational lessons) – only the best products 4

MODIS vs. MERIS Same parameterSame space & time Different results – why? MODIS MERIS A threshold used in MERIS processing effectively excludes high aerosol values. Note: MERIS was designed primarily as an ocean-color instrument, so aerosols are “obstacles” not signal. 5

Spatial and temporal sampling – how to quantify to make it useful for modelers? Completeness: MODIS dark target algorithm does not work for deserts Representativeness: monthly aggregation is not enough for MISR and even MODIS Spatial sampling patterns are different for MODIS Aqua and MISR Terra: “pulsating” areas over ocean are oriented differently due to different orbital direction during day-time measurement  Cognitive bias MODIS Aqua AOD July 2009MISR Terra AOD July

Anomaly Example: South Pacific Anomaly Anomaly 7 MODIS Level 3 dataday definition leads to artifact in correlation

…is caused by an Overpass Time Difference 8

Correlation between MODIS Aqua AOD (Ocean group product) and MODIS-Aqua AOD (Atmosphere group product) Pixel Count distribution Only half of the Data Day artifact is present because the Ocean Group uses the better Data Day definition! Sensitivity Study: Effect of the Data Day definition on Ocean Color data correlation with Aerosol data 9

Why so difficult? Quality is perceived differently by data providers and data recipients. There are many different qualitative and quantitative aspects of quality. Methodologies for dealing with data qualities are just emerging Almost nothing exists for remote sensing data quality Even the most comprehensive review (Batini’s book) demonstrates that there are no preferred methodologies for solving many data quality issues Little funding was allocated in the past to data quality as the priority was to build an instrument, launch a rocket, collect and process data, and publish a paper using just one set of data. Each science team handled quality differently. 10

More terminology ‘Even a slight difference in terminology can lead to significant differences between data from different sensors. It gives an IMPRESSION of data being of bad quality while in fact they measure different things. For example, MODIS and MISR definitions of the aerosol "fine mode" is different, so the direct comparison of fine modes from MODIS and MISR does not always give good correlation.’ Ralph Kahn, MISR Aerosol Lead. 11

Quality Control vs. Quality Assessment Quality Control (QC) flags in the data (assigned by the algorithm) reflect “happiness” of the retrieval algorithm, e.g., all the necessary channels indeed had data, not too many clouds, the algorithm has converged to a solution, etc. Quality assessment is done by analyzing the data “after the fact” through validation, intercomparison with other measurements, self-consistency, etc. It is presented as bias and uncertainty. It is rather inconsistent and can be found in papers, validation reports all over the place. 12

Different kinds of reported data quality Pixel-level Quality: algorithmic guess at usability of data point –Granule-level Quality: statistical roll-up of Pixel-level Quality Product-level Quality: how closely the data represent the actual geophysical state Record-level Quality: how consistent and reliable the data record is across generations of measurements Different quality types are often erroneously assumed having the same meaning Ensuring Data Quality at these different levels requires different focus and action 13

Three projects with data & information quality flavor Multi-sensor Data Synergy Advisor (**) –Product level Goal: Provide science users with clear, cogent information on salient differences between data candidates for fusion, merging and intercomparison and enable scientifically and statistically valid conclusions Develop MDSA on current missions – Terra, Aqua, (maybe Aura) Define implications for future missions Data Quality Screening Service –Pixel level Aerosol Status –Record level 14

Giovanni Earth Science Data Visualization & Analysis Tool Developed and hosted by NASA/ Goddard Space Flight Center (GSFC) Multi-sensor and model data analysis and visualization online tool Supports dozens of visualization types Generate dataset comparisons ~1500 Parameters Used by modelers, researchers, policy makers, students, teachers, etc. 15

Web-based tools like Giovanni allow scientists to compress the time needed for pre- science preliminary tasks: data discovery, access, manipulation, visualization, and basic statistical analysis. DO SCIENCE Submit the paper Minutes Web-based Services: Perform filtering/masking Find data Retrieve high volume data Extract parameters Perform spatial and other subsetting Identify quality and other flags and constraints Develop analysis and visualization Accept/discard/get more data (sat, model, ground-based) Learn formats and develop readers Jan Feb Mar May Jun Apr Pre- Science Days for exploration Use the best data for the final analysis Write the paper Derive conclusions Exploration Use the best data for the final analysis Write the paper Initial Analysis Derive conclusions Submit the paper Jul Aug Sep Oct The Old Way: The Giovanni Way: Read Data Reformat Analyze Explore Reproject Visualize Extract Parameter Giovanni Mirador Scientists have more time to do science! DO SCIENCE Giovanni Allows Scientists to Concentrate on the Science Filter Quality Subset Spatially 16

Data Discovery AssessmentAccessManipulationVisualizationAnalyze Data Usage Workflow 17

Data Discovery AssessmentAccessManipulationVisualizationAnalyze Data Usage Workflow 18 Integration Reformat Re-project Filtering Subset / Constrain

Data Discovery AssessmentAccessManipulationVisualizationAnalyze Data Usage Workflow 19 Integration Planning Precision Requirements Quality Assessment Requirements Intended Use Integration Reformat Re-project Filtering Subset / Constrain

Informatics approach Systematizing quality aspects –Working through literature –Identifying aspects of quality and their dependence of measurement and environmental conditions –Developing Data Quality ontologies –Understanding and collecting internal and external provenance Developing rulesets allows to infer pieces of knowledge to extract and assemble Presenting the data quality knowledge with good visual, statement and references 20

Data Quality Ontology Development (Quality flag) Working together with Chris Lynnes’s DQSS project, started from the pixel-level quality view. 21

Data Quality Ontology Development (Bias) ?rid= _ _22228&partName=htmltext 22

Modeling quality (Uncertainty) Link to other cmap presentations of quality ontology: ResourceServlet?rid= _ _19570&partName=htmltext 23

MDSA Aerosol Data Ontology Example Ontology of Aerosol Data made with cmap ontology editor 24

RuleSet Development [DiffNEQCT: (?s rdf:type gio:RequestedService), (?s gio:input ?a), (?a rdf:type gio:DataSelection), (?s gio:input ?b), (?b rdf:type gio:DataSelection), (?a gio:sourceDataset ?a.ds), (?b gio:sourceDataset ?b.ds), (?a.ds gio:fromDeployment ?a.dply), (?b.ds gio:fromDeployment ?b.dply), (?a.dply rdf:type gio:SunSynchronousOrbitalDeployment), (?b.dply rdf:type gio:SunSynchronousOrbitalDeployment), (?a.dply gio:hasNominalEquatorialCrossingTime ?a.neqct), (?b.dply gio:hasNominalEquatorialCrossingTime ?b.neqct), notEqual(?a.neqct, ?b.neqct) -> (?s gio:issueAdvisory giodata:DifferentNEQCTAdvisory)] 25

Data Discovery AssessmentAccessManipulationVisualizationAnalyze Re- Assessment Assisting in Assessment 26 Integration Planning Precision Requirements Quality Assessment Requirements Intended Use Integration Reformat Re-project Filtering Subset / Constrain MDSA Advisory Report Provenance & Lineage Visualization

Advisor Knowledge Base 27 Advisor Rules test for potential anomalies, create association between service metadata and anomaly metadata in Advisor KB

Semantic Advisor Architecture RPI 28

Advisory Report (Dimension Comparison Detail) 29

Advisory Report (Expert Advisories Detail) 30

Summary Quality is very hard to characterize, different groups will focus on different and inconsistent measures of quality –Modern ontology representations to the rescue! Products with known Quality (whether good or bad quality) are more valuable than products with unknown Quality. –Known quality helps you correctly assess fitness-for-use Harmonization of data quality is even more difficult that characterizing quality of a single data product 31

Summary Advisory Report is not a replacement for proper analysis planning –But provides benefit for all user types summarizing general fitness-for-usage, integrability, and data usage caveat information –Science user feedback has been very positive Provenance trace dumps are difficult to read, especially to non-software engineers –Science user feedback; “Too much information in provenance lineage, I need a simplified abstraction/view” Transparency  Translucency –make the important stuff stand out 32

Future Work Advisor suggestions to correct for potential anomalies Views/abstractions of provenance based on specific user group requirements Continued iteration on visualization tools based on user requirements Present a comparability index / research techniques to quantify comparability 33

Extra material 34

Acronyms ACCESSAdvancing Collaborative Connections for Earth System Science ACEAerosol-Cloud-Ecosystems AGUAmerican Geophysical Union AISTAdvanced Information Systems Technology AODAerosol Optical Depth AVHRRAdvanced Very High Resolution Radiometer GACMGlobal Atmospheric Composition Mission GeoCAPEGeostationary Coastal and Air Pollution Events GEWEXGlobal Energy and Water Cycle Experiment GOESGeostationary Operational Environmental Satellite GOME-2Global Ozone Monitoring Experiment-2 JPSSJoint Polar Satellite System LSTLocal Solar Time MDSAMulti-sensor Data Synergy Advisor MISRMultiangle Imaging SpectroRadiometer MODISModerate Resolution Imaging Spectraradiometer NPPNational Polar-Orbiting Operational Environmental Satellite System Preparatory Project 35

Acronyms (cont.) OMIOzone Monitoring Instrument OWLWeb Ontology Language PMLProof Markup Language QA4EOQA for Earth Observations RESTRepresentational State Transfer TRLTechnology Readiness Level UTCCoordinated Universal Time WADLWeb Application Description Language XMLeXtensible Markup Language XSLeXtensible Stylesheet Language XSLTXSL Transformation 36

Quality & Bias assessment using FreeMind from the Aerosol Parameter Ontology FreeMind allows capturing various relations between various aspects of aerosol measurements, algorithms, conditions, validation, etc. The “traditional” worksheets do not support complex multi-dimensional nature of the task 37

Reference: Hyer, E. J., Reid, J. S., and Zhang, J., 2011: An over-land aerosol optical depth data set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 optical depth retrievals, Atmos. Meas. Tech., 4, , doi: /amt Title: MODIS Terra C5 AOD vs. Aeronet during Aug-Oct Biomass burning in Central Brazil, South America (General) Statement: Collection 5 MODIS AOD at 550 nm during Aug- Oct over Central South America highly over-estimates for large AOD and in non-burning season underestimates for small AOD, as compared to Aeronet; good comparisons are found at moderate AOD. Region & season characteristics: Central region of Brazil is mix of forest, cerrado, and pasture and known to have low AOD most of the year except during biomass burning season (Example) : Scatter plot of MODIS AOD and AOD at 550 nm vs. Aeronet from ref. (Hyer et al, 2011) (Description Caption) shows severe over-estimation of MODIS Col 5 AOD (dark target algorithm) at large AOD at 550 nm during Aug- Oct over Brazil. (Constraints) Only best quality of MODIS data (Quality =3 ) used. Data with scattering angle > 170 deg excluded. (Symbols) Red Lines define regions of Expected Error (EE), Green is the fitted slope Results: Tolerance= 62% within EE; RMSE=0.212 ; r2=0.81; Slope=1.00 For Low AOD ( 1.4) Slope=1.54 (Dominating factors leading to Aerosol Estimate bias): 1.Large positive bias in AOD estimate during biomass burning season may be due to wrong assignment of Aerosol absorbing characteristics. (Specific explanation) a constant Single Scattering Albedo ~ 0.91 is assigned for all seasons, while the true value is closer to ~ [ Notes or exceptions: Biomass burning regions in Southern Africa do not show as large positive bias as in this case, it may be due to different optical characteristics or single scattering albedo of smoke particles, Aeronet observations of SSA confirm this] 2. Low AOD is common in non burning season. In Low AOD cases, biases are highly dependent on lower boundary conditions. In general a negative bias is found due to uncertainty in Surface Reflectance Characterization which dominates if signal from atmospheric aerosol is low Aeronet AOD Central South America * Mato Grosso * Santa Cruz * Alta Floresta 38