Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experiences Developing a User-centric Presentation of Provenance for a Web- based Science Data Analysis Tool Stephan Zednik 1, Gregory Leptoukh 2, Peter.

Similar presentations


Presentation on theme: "Experiences Developing a User-centric Presentation of Provenance for a Web- based Science Data Analysis Tool Stephan Zednik 1, Gregory Leptoukh 2, Peter."— Presentation transcript:

1 Experiences Developing a User-centric Presentation of Provenance for a Web- based Science Data Analysis Tool Stephan Zednik 1, Gregory Leptoukh 2, Peter Fox 1, Chris Lynnes 2, Jianfu Pan 3 1. Tetherless World Constellation, Rensselaer Polytechnic Inst., Troy, NY, United States 2. NASA Goddard Space Flight Center, Greenbelt, MD, United States 3. Adnet Systems, Inc. ESSI12 EGU2011-4928

2 Giovanni Earth Science Data Visualization & Analysis Tool Developed and hosted by NASA/ Goddard Space Flight Center (GSFC) Multi-sensor and model data analysis and visualization online tool Supports dozens of visualization types Generate dataset comparisons ~1500 Parameters Used by modelers, researchers, policy makers, students, teachers, etc. 2

3 Data Discovery AssessmentAccessManipulationVisualizationAnalyze Data Usage Workflow 3

4 Data Discovery AssessmentAccessManipulationVisualizationAnalyze Data Usage Workflow 4 Integration Reformat Re-project Filtering Subset / Constrain

5 Data Discovery AssessmentAccessManipulationVisualizationAnalyze Data Usage Workflow 5 Integration Planning Precision Requirements Quality Assessment Requirements Intended Use Integration Reformat Re-project Filtering Subset / Constrain

6 Challenge Giovanni streamlines data processing, performing required actions on behalf of the user –but automation amplifies the potential for users to generate and use results they do not fully understand The assessment stage is integral for the user to understand fitness-for-use of the result –but Giovanni does not assist in assessment We are challenged to instrument the system to help users understand results 6

7 Anomaly Example: South Pacific Anomaly Anomaly 7

8 …is caused by an Overpass Time Difference 8

9 Multi-Sensor Data Synergy Advisor (MDSA) Assemble semantic knowledge base –Giovanni Service Selections –Data Source Provenance (external provenance - low detail) –Giovanni Planned Operations (what service intends to do) Analyze service plan –Are we integrating/comparing/synthesizing? Are similar dimensions in data sources semantically comparable? (semantic diff) How comparable? (semantic distance) –What data usage caveats exist for data sources? Advise regarding general fitness-for-use and data-usage caveats 9

10 Data Discovery AssessmentAccessManipulationVisualizationAnalyze Re- Assessment Assisting in Assessment 10 Integration Planning Precision Requirements Quality Assessment Requirements Intended Use Integration Reformat Re-project Filtering Subset / Constrain MDSA Advisory Report Provenance & Lineage Visualization

11 Multi-Domain Knowledgebase 11 Provenance Domain Earth Science Domain Data Processing Domain

12 Advisor Knowledge Base 12 Advisor Rules test for potential anomalies, create association between service metadata and anomaly metadata in Advisor KB

13 Advisor Presentation Requirements Present metadata that can affect fitness for use of result In comparison or integration data sources –Make obvious which properties are comparable –Highlight differences (that affect comparability) where present Present descriptive text (and if possible visuals) for any data usage caveats highlighted by expert ruleset Presentation must be understandable by Earth Scientists 13

14 Advisory Report Tabular representation of the semantic equivalence of comparable data source and processing properties Advise of and describe potential data anomalies/bias 14

15 Advisory Report (Dimension Comparison Detail) 15

16 Advisory Report (Expert Advisories Detail) 16

17 Giovanni Provenance Visualization Requirements Exercise existing provenance visualization tools to show Giovanni processing provenance Visualization tool must support access to multi-domain metadata knowledgebase (not just provenance metadata) –Science metadata adds domain context to entities in the provenance trace Presentation must be understandable by Earth Scientists 17

18 Domain-integrated Provenance Visualization 18

19 Domain-integrated detail view 19

20 Conclusion Advisory Report is not a replacement for proper analysis planning –But provides benefit for all user types summarizing general fitness-for-usage, integrability, and data usage caveat information –Science user feedback has been very positive Provenance trace dumps are difficult to read, especially to non- software engineers –Science user feedback; “Too much information in provenance lineage, I need a simplified abstraction/view” Transparency  Translucency –make the important stuff stand out Semantic Distance / Integrability Index is non-trivial 20

21 Future Work –Advisor suggestions to correct for potential anomalies –Views/abstractions of provenance based on specific user group requirements –Continued iteration on visualization tools based on user requirements –Present a comparability index / research techniques to quantify comparability 21

22 References G. Leptoukh, D. Lary, S. Shen, C. Lynnes, Impact of Day Definition on Daily Correlative Studies, 2010. MODIS Science Team Meeting, January 2010. Zednik, S., Fox, P., & McGuinness, D. (2010). System Transparency, or How I Learned to Worry about Meaning and Love Provenance! 3rd International Provenance and Annotation Workshop, Troy, NY. 22

23 Links Giovanni Earth Science Data Analysis Tool –http://disc.sci.gsfc.nasa.gov/giovanni/ (Production site)http://disc.sci.gsfc.nasa.gov/giovanni/ –http://giovanniplus-ts1.sci.gsfc.nasa.gov/daac- bin/G3/gui.cgi?instance_id=MDSA-case1 (MDSA site)http://giovanniplus-ts1.sci.gsfc.nasa.gov/daac- bin/G3/gui.cgi?instance_id=MDSA-case1 MDSA –http://tw.rpi.edu/web/project/MDSA (Project site)http://tw.rpi.edu/web/project/MDSA PML –http://inference-web.org (Inference Web)http://inference-web.org –http://inference-web.org/2007/primer/ (PML Primer, 2007)http://inference-web.org/2007/primer/ 23

24 Reference : Hyer, E. J., Reid, J. S., and Zhang, J., 2011: An over-land aerosol optical depth data set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 optical depth retrievals, Atmos. Meas. Tech., 4, 379-408, doi:10.5194/amt-4-379- 2011 Title: MODIS Terra C5 AOD vs. Aeronet during Aug-Oct Biomass burning in Central Brazil, South America (General) Statement: Collection 5 MODIS AOD at 550 nm during Aug-Oct over Central South America highly over-estimates for large AOD and in non-burning season underestimates for small AOD, as compared to Aeronet; good comparisons are found at moderate AOD. Region & season characteristics: Central region of Brazil is mix of forest, cerrado, and pasture and known to have low AOD most of the year except during biomass burning season (Example) : (Title) Scatter plot of MODIS AOD and AOD at 550 nm vs. Aeronet from ref. (Hyer et al, 2011) (Description Caption) shows severe over-estimation of MODIS Col 5 AOD (dark target algorithm) at large AOD at 550 nm during Aug-Oct 2005-2008 over Brazil. (Constraints) Only best quality of MODIS data (Quality =3 ) used. Data with large scattering angle ( > 170 deg) excluded. (Symbol description) Red Lines define regions of Expected Error (EE), Green is the fitted slope Results: Tolerance= 62% within EE; RMSE=0.212 ; r2=0.81; Slope=1.00 For Low AOD (<0.2) Slope=0.3 (i.e MODIS AOD= one third of Aeronet AOD) For high AOD (> 1.4) Slope=1.54 (Specific explanation) because of uncertainty introduced in AOD retrieval due to hot spot effect which is not taken into account in MODIS retrieval algorithm. Large positive bias in AOD estimate during biomass burning season may be due to wrong assignment of Aerosol absorbing characteristics (a constant Single Scattering Albedo ~ 0.91 is assigned for all seasons, true value is closer to ~0.92-0.93) ( Dominating factors leading to Aerosol Estimate bias): 1.Large positive bias in AOD estimate during biomass burning season may be due to wrong assignment of Aerosol absorbing characteristics (a constant Single Scattering Albedo ~ 0.91 is assigned for all seasons, while the true value is closer to ~0.92-0.93) [ Notes or exceptions: Biomass burning regions in Southern Africa do not show as large positive bias as in this case, it may be due to different optical characteristics or single scattering albedo of smoke particles, Aeronet observations of SSA confirm this ] 2. Low AOD is common in non burning season. In Low AOD cases, biases are highly dependent on lower boundary conditions. In general a negative bias is found due to uncertainty in Surface Reflectance Characterization which dominates if signal from atmospheric aerosol is low. 0 1 2 Aeronet AOD Central South America * Mato Grosso * Santa Cruz * Alta Floresta


Download ppt "Experiences Developing a User-centric Presentation of Provenance for a Web- based Science Data Analysis Tool Stephan Zednik 1, Gregory Leptoukh 2, Peter."

Similar presentations


Ads by Google