Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Peter Fox Xinformatics Week 10, April 2, 2012 Worked example.

Similar presentations


Presentation on theme: "1 Peter Fox Xinformatics Week 10, April 2, 2012 Worked example."— Presentation transcript:

1 1 Peter Fox Xinformatics Week 10, April 2, 2012 Worked example

2 Contents Review of last class, reading Information quality, uncertainty (again) and bias (again) in a project setting. Your projects? Last few classes 2

3 The semantics of data and information quality, uncertainty and bias and applications in atmospheric science Peter Fox, and … Stephan Zednik 1, Gregory Leptoukh 2, Chris Lynnes 2, Jianfu Pan 3 1. Tetherless World Constellation, Rensselaer Polytechnic Inst. 2. NASA Goddard Space Flight Center, Greenbelt, MD, United States 3. Adnet Systems, Inc.

4 Definitions (Atmospheric) Quality –Is in the eyes of the beholder – worst case scenario… or a good challenge Uncertainty –has aspects of accuracy (how accurately the real world situation is assessed, it also includes bias) and precision (down to how many digits) Bias has two aspects: –Systematic error resulting in the distortion of measurement data caused by prejudice or faulty measurement technique –A vested interest, or strongly held paradigm or condition that may skew the results of sampling, measuring, or reporting the findings of a quality assessment: Psychological: for example, when data providers audit their own data, they usually have a bias to overstate its quality. Sampling: Sampling procedures that result in a sample that is not truly representative of the population sampled. (Larry English) 4

5 Acronyms ACCESSAdvancing Collaborative Connections for Earth System Science ACEAerosol-Cloud-Ecosystems AGUAmerican Geophysical Union AISTAdvanced Information Systems Technology AODAerosol Optical Depth AVHRRAdvanced Very High Resolution Radiometer GACMGlobal Atmospheric Composition Mission GeoCAPEGeostationary Coastal and Air Pollution Events GEWEXGlobal Energy and Water Cycle Experiment GOESGeostationary Operational Environmental Satellite GOME-2Global Ozone Monitoring Experiment-2 JPSSJoint Polar Satellite System LSTLocal Solar Time MDSAMulti-sensor Data Synergy Advisor MISRMultiangle Imaging SpectroRadiometer MODISModerate Resolution Imaging Spectraradiometer NPPNational Polar-Orbiting Operational Environmental Satellite System Preparatory Project

6 Acronyms (cont.) OMIOzone Monitoring Instrument OWLWeb Ontology Language PMLProof Markup Language QA4EOQA for Earth Observations RESTRepresentational State Transfer TRLTechnology Readiness Level UTCCoordinated Universal Time WADLWeb Application Description Language XMLeXtensible Markup Language XSLeXtensible Stylesheet Language XSLTXSL Transformation

7 Data quality needs: fitness for purpose Measuring Climate Change: –Model validation: gridded contiguous data with uncertainties –Long-term time series: bias assessment is the must, especially sensor degradation, orbit and spatial sampling change Studying phenomena using multi-sensor data: –Cross-sensor bias is needed Realizing Societal Benefits through Applications: –Near-Real Time for transport/event monitoring - in some cases, coverage and timeliness might be more important that accuracy –Pollution monitoring (e.g., air quality exceedance levels) – accuracy Educational (users generally not well-versed in the intricacies of quality; just taking all the data as usable can impair educational lessons) – only the best products

8 Where are we in respect to this data challenge? “The user cannot find the data; If he can find it, cannot access it; If he can access it, ; he doesn't know how good they are; if he finds them good, he can not merge them with other data” The Users View of IT, NAS 1989 8

9 Level 2 data 9

10 Swath for MISR, orbit 192 (2001) 10

11 Level 3 data 11

12 MODIS vs. MERIS Same parameterSame space & time Different results – why? MODIS MERIS A threshold used in MERIS processing effectively excludes high aerosol values. Note: MERIS was designed primarily as an ocean-color instrument, so aerosols are “obstacles” not signal.

13 Why so difficult? Quality is perceived differently by data providers and data recipients. There are many different qualitative and quantitative aspects of quality. Methodologies for dealing with data qualities are just emerging Almost nothing exists for remote sensing data quality Even the most comprehensive review Batini’s book demonstrates that there are no preferred methodologies for solving many data quality issues Little funding was allocated in the past to data quality as the priority was to build an instrument, launch a rocket, collect and process data, and publish a paper using just one set of data. Each science team handled quality differently.

14 Spatial and temporal sampling – how to quantify to make it useful for modelers? Completeness: MODIS dark target algorithm does not work for deserts Representativeness: monthly aggregation is not enough for MISR and even MODIS Spatial sampling patterns are different for MODIS Aqua and MISR Terra: “pulsating” areas over ocean are oriented differently due to different orbital direction during day-time measurement  Cognitive bias MODIS Aqua AOD July 2009MISR Terra AOD July 2009

15 Quality Control vs. Quality Assessment Quality Control (QC) flags in the data (assigned by the algorithm) reflect “happiness” of the retrieval algorithm, e.g., all the necessary channels indeed had data, not too many clouds, the algorithm has converged to a solution, etc. Quality assessment is done by analyzing the data “after the fact” through validation, intercomparison with other measurements, self-consistency, etc. It is presented as bias and uncertainty. It is rather inconsistent and can be found in papers, validation reports all over the place.

16 Challenges in dealing with Data Quality Q: Why now? What has changed? A: With the recent revolutionary progress in data systems, dealing with data from many different sensors finally has become a reality. Only now, a systematic approach to remote sensing quality is on the table. NASA is beefing up efforts on data quality. ESA is seriously addressing these issues. QA4EO: an international effort to bring communities together on data quality. Many information and computer science research questions 16

17 Data from multiple sources to be used together: Current sensors/missions: MODIS, MISR, GOES, OMI. Future decadal missions: ACE, NPP, JPSS, Geo-CAPE European and other countries’ satellites Models Harmonization needs: It is not sufficient just to have the data from different sensors and their provenances in one place Before comparing and fusing data, things need to be harmonized: Metadata: terminology, standard fields, units, scale Data: format, grid, spatial and temporal resolution, wavelength, etc. Provenance: source, assumptions, algorithm, processing steps Quality: bias, uncertainty, fitness-for-purpose, validation Quality: bias, uncertainty, fitness-for-purpose, validation Dangers of easy data access without proper assessment of the joint data usage - It is easy to use data incorrectly Intercomparison of data from multiple sensors 17

18 Anomaly Example: South Pacific Anomaly Anomaly 18 MODIS Level 3 dataday definition leads to artifact in correlation

19 …is caused by an Overpass Time Difference 19

20 Different kinds of reported data quality Pixel-level Quality: algorithmic guess at usability of data point –Granule-level Quality: statistical roll-up of Pixel-level Quality Product-level Quality: how closely the data represent the actual geophysical state Record-level Quality: how consistent and reliable the data record is across generations of measurements Different quality types are often erroneously assumed having the same meaning Ensuring Data Quality at these different levels requires different focus and action

21 Sensitivity of Aerosol and Chl Relationship to Data-Day Definition Correlation Coefficients MODIS AOT at 550nm and SeaWiFS Chl Difference between Correlation of A and B: A: MODIS AOT of LST and SeaWiFS Chl B: MODIS AOT of UTC and SeaWiFS Chl Artifact: difference between using LST and the calendar UTC-based dataday

22 Correlation between MODIS Aqua AOD (Ocean group product) and MODIS-Aqua AOD (Atmosphere group product) Pixel Count distribution Only half of the Data Day artifact is present because the Ocean Group uses the better Data Day definition! Sensitivity Study: Effect of the Data Day definition on Ocean Color data correlation with Aerosol data

23 Factors contributing to uncertainty and bias in Level 2 Physical: instrument, retrieval algorithm, aerosol spatial and temporal variability… Input: ancillary data used by the retrieval algorithm Classification: erroneous flagging of the data Simulation: the geophysical model used for the retrieval Sampling: the averaging within the retrieval footprint Borrowed from the SST study on error budget 23

24 What is Level 3 quality? It is not defined in Earth Science…. If Level 2 errors are known, the corresponding Level 3 error can be computed, in principle Processing from L2  L3 daily  L3 monthly may reduce random noise but can also exacerbate systematic bias and introduce additional sampling bias However, at best, standard deviations (mostly reflecting variability within a grid box), and sometimes pixel counts and quality histograms are provided Convolution of natural variability with sensor/retrieval uncertainty and bias – need to understand their relative contribution to differences between data This does not address sampling bias

25 Why can’t we just apply L2 quality to L3? Aggregation to L3 introduces new issues where aerosols co-vary with some observing or environmental conditions – sampling bias: Spatial: sampling polar areas more than equatorial Temporal: sampling one time of a day only (not obvious when looking at L3 maps) Vertical: not sensitive to a certain part of the atmosphere thus emphasizing other parts Contextual: bright surface or clear sky bias Pixel Quality: filtering or weighting by quality may mask out areas with specific features

26 Addressing Level 3 data “quality” Terminology: Quality, Uncertainty, Bias, Error budget, etc. Quality aspects (examples): –Completeness: Spatial (MODIS covers more than MISR) Temporal (Terra mission has been longer in space than Aqua) Observing Condition (MODIS cannot measure over sun glint while MISR can) –Consistency: Spatial (e.g., not changing over sea-land boundary) Temporal (e.g., trends, discontinuities and anomalies) Observing Condition (e.g., exhibit variations in retrieved measurements due to the viewing conditions, such as viewing geometry or cloud fraction) –Representativeness: Neither pixel count nor standard deviation fully express representativeness of the grid cell value –Data Quality Glossary development: http://tw.rpi.edu/web/project/MDSA/Glossary 26

27 More terminology ‘Even a slight difference in terminology can lead to significant differences between data from different sensors. It gives an IMPRESSION of data being of bad quality while in fact they measure different things. For example, MODIS and MISR definitions of the aerosol "fine mode" is different, so the direct comparison of fine modes from MODIS and MISR does not always give good correlation.’ Ralph Kahn, MISR Aerosol Lead. 27

28 Three projects with data quality flavor Multi-sensor Data Synergy Advisor (**) –Product level Data Quality Screening Service –Pixel level Aerosol Status –Record level 28

29 Multi-Sensor Data Synergy Advisor (MDSA) Goal: Provide science users with clear, cogent information on salient differences between data candidates for fusion, merging and intercomparison –Enable scientifically and statistically valid conclusions Develop MDSA on current missions: – NASA - Terra, Aqua, (maybe Aura) Define implications for future missions 29

30 How MDSA works? MDSA is a service designed to characterize the differences between two datasets and advise a user (human or machine) on the advisability of combining them. Provides the Giovanni online analysis tool Describes parameter and products Documents steps leading to the final data product Enables better interpretation and utilization of parameter difference and correlation visualizations. Provides clear and cogent information on salient differences between data candidates for intercomparison and fusion. Provides information on data quality Provides advice on available options for further data processing and analysis. 30

31 Giovanni Earth Science Data Visualization & Analysis Tool Developed and hosted by NASA/ Goddard Space Flight Center (GSFC) Multi-sensor and model data analysis and visualization online tool Supports dozens of visualization types Generate dataset comparisons ~1500 Parameters Used by modelers, researchers, policy makers, students, teachers, etc. 31

32 Web-based tools like Giovanni allow scientists to compress the time needed for pre- science preliminary tasks: data discovery, access, manipulation, visualization, and basic statistical analysis. DO SCIENCE Submit the paper Minutes Web-based Services: Perform filtering/masking Find data Retrieve high volume data Extract parameters Perform spatial and other subsetting Identify quality and other flags and constraints Develop analysis and visualization Accept/discard/get more data (sat, model, ground-based) Learn formats and develop readers Jan Feb Mar May Jun Apr Pre- Science Days for exploration Use the best data for the final analysis Write the paper Derive conclusions Exploration Use the best data for the final analysis Write the paper Initial Analysis Derive conclusions Submit the paper Jul Aug Sep Oct The Old Way: The Giovanni Way: Read Data Reformat Analyze Explore Reproject Visualize Extract Parameter Giovanni Mirador Scientists have more time to do science! DO SCIENCE Giovanni Allows Scientists to Concentrate on the Science Filter Quality Subset Spatially

33 Data Discovery AssessmentAccessManipulationVisualizationAnalyze Data Usage Workflow 33

34 Data Discovery AssessmentAccessManipulationVisualizationAnalyze Data Usage Workflow 34 Integration Reformat Re-project Filtering Subset / Constrain

35 Data Discovery AssessmentAccessManipulationVisualizationAnalyze Data Usage Workflow 35 Integration Planning Precision Requirements Quality Assessment Requirements Intended Use Integration Reformat Re-project Filtering Subset / Constrain

36 Challenge Giovanni streamlines data processing, performing required actions on behalf of the user –but automation amplifies the potential for users to generate and use results they do not fully understand The assessment stage is integral for the user to understand fitness-for- use of the result –but Giovanni does not assist in assessment We are challenged to instrument the system to help users understand results 36

37 Advising users on Product quality We need to be able to advise MDSA users on: Which product is better? Or, a better formulated question: Which product has better quality over certain areas? Address harmonization of quality across products How does sampling bias affect product quality? –Spatial: sampling polar area more than equatorial –Temporal: sampling one time of a day only –Vertical: not sensitive to a certain part of the atmosphere thus emphasizing other parts – Pixel Quality : filtering by quality may mask out areas with specific features – Clear sky: e.g., measuring humidity only where there are no clouds may lead to dry bias – Surface type related

38 Research approach Systematizing quality aspects –Working through literature –Identifying aspects of quality and their dependence of measurement and environmental conditions –Developing Data Quality ontologies –Understanding and collecting internal and external provenance Developing rulesets allows to infer pieces of knowledge to extract and assemble Presenting the data quality knowledge with good visual, statement and references

39 Data Quality Ontology Development (Quality flag) Working together with Chris Lynnes’s DQSS project, started from the pixel-level quality view.

40 Data Quality Ontology Development (Bias) http://cmapspublic3.ihmc.us:80/servlet/SBReadResourceServlet ?rid=1286316097170_183793435_22228&partName=htmltext

41 Modeling quality (Uncertainty) Link to other cmap presentations of quality ontology: http://cmapspublic3.ihmc.us:80/servlet/SBRead ResourceServlet?rid=1299017667444_189782 5847_19570&partName=htmltext

42 MDSA Aerosol Data Ontology Example Ontology of Aerosol Data made with cmap ontology editor

43 RuleSet Development [DiffNEQCT: (?s rdf:type gio:RequestedService), (?s gio:input ?a), (?a rdf:type gio:DataSelection), (?s gio:input ?b), (?b rdf:type gio:DataSelection), (?a gio:sourceDataset ?a.ds), (?b gio:sourceDataset ?b.ds), (?a.ds gio:fromDeployment ?a.dply), (?b.ds gio:fromDeployment ?b.dply), (?a.dply rdf:type gio:SunSynchronousOrbitalDeployment), (?b.dply rdf:type gio:SunSynchronousOrbitalDeployment), (?a.dply gio:hasNominalEquatorialCrossingTime ?a.neqct), (?b.dply gio:hasNominalEquatorialCrossingTime ?b.neqct), notEqual(?a.neqct, ?b.neqct) -> (?s gio:issueAdvisory giodata:DifferentNEQCTAdvisory)]

44 Thus - Multi-Sensor Data Synergy Advisor Assemble semantic knowledge base –Giovanni Service Selections –Data Source Provenance (external provenance - low detail) –Giovanni Planned Operations (what service intends to do) Analyze service plan –Are we integrating/comparing/synthesizing? Are similar dimensions in data sources semantically comparable? (semantic diff) How comparable? (semantic distance) –What data usage caveats exist for data sources? Advise regarding general fitness-for-use and data-usage caveats 44

45 Provenance Distance Computation Provenance: Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility. Based on provenance “distance”, we tell users how different data products are. Issues: Computing the similarity of two provenance traces is non-trivial Factors in provenance have varied weight on how comparable results of processing are Factors in provenance are interdependent in how they affect final results of processing Need to characterize similarity of external (pre-Giovanni) provenance Dimensions/factors that affect comparability is quickly overwhelming Not all of these dimensions are independent - most of them are correlated with each other. Numerical studies comparing datasets can be used, when available, and where applicable to Giovanni analysis

46 Data Discovery AssessmentAccessManipulationVisualizationAnalyze Re- Assessment Assisting in Assessment 46 Integration Planning Precision Requirements Quality Assessment Requirements Intended Use Integration Reformat Re-project Filtering Subset / Constrain MDSA Advisory Report Provenance & Lineage Visualization

47 Multi-Domain Knowledgebase 47 Provenance Domain Earth Science Domain Data Processing Domain

48 Advisor Knowledge Base 48 Advisor Rules test for potential anomalies, create association between service metadata and anomaly metadata in Advisor KB

49 Semantic Advisor Architecture RPI

50 …. complexity 50

51 Presenting data quality to users We split quality (viewed here broadly) into two categories: Global or product level quality information, e.g. consistency, completeness, etc., that can be presented in a tabular form. Regional/seasonal. This is where we've tried various approaches: –maps with outlines regions, one map per sensor/parameter/season –scatter plots with error estimates, one per a combination of Aeronet station, parameter, and season; with different colors representing different wavelengths, etc.

52 Advisor Presentation Requirements Present metadata that can affect fitness for use of result In comparison or integration data sources –Make obvious which properties are comparable –Highlight differences (that affect comparability) where present Present descriptive text (and if possible visuals) for any data usage caveats highlighted by expert ruleset Presentation must be understandable by Earth Scientists!! Oh you laugh… 52

53 Advisory Report Tabular representation of the semantic equivalence of comparable data source and processing properties Advise of and describe potential data anomalies/bias 53

54 Advisory Report (Dimension Comparison Detail) 54

55 Advisory Report (Expert Advisories Detail) 55

56 Quality Comparison Table for Level- 3 AOD (Global example) Quality AspectMODISMISR Completeness Total Time Range Platform Time Range 2/2/200-present Terra2/2/2000-present Aqua7/2/2002-present Local Revisit Time Platform Time Range Terra10:30 AMTerra 10:30 AM Aqua1:30 PM Revisit Timeglobal coverage of entire earth in 1 day; coverage overlap near pole global coverage of entire earth in 9 days & coverage in 2 days in polar region Swath Width 2330 km380 km Spectral AOD AOD over ocean for 7 wavelengths (466, 553, 660, 860, 1240, 1640, 2120 nm ); AOD over land for 4 wavelengths (466, 553, 660, 2120 nm (land) AOD over land and ocean for 4 wavelengths (446, 558, 672, and 866 nm) AOD Uncertainty or Expected Error (EE) +-0.03+- 5% (over ocean; QAC > = 1) +-0.05+-20% (over land, QAC=3); 63% fall within 0.05 or 20% of Aeronet AOD; 40% are within 0.03 or 10% Successful Retrievals 15% of Time15% of Time (slightly more because of retrieval over Glint region also)

57 Data Quality Screening Service for Remote Sensing Data The DQSS filters out bad pixels for the user Default user scenario –Search for data –Select science team recommendation for quality screening (filtering) –Download screened data More advanced scenario –Search for data –Select custom quality screening parameters –Download screened data 57

58 The quality of data can vary considerably AIRS ParameterBest (%) Good (%) Do Not Use (%) Total Precipitable Water 38 24 Carbon Monoxide64729 Surface Temperature54451 Version 5 Level 2 Standard Retrieval Statistics

59 Percent of Biased Data in MODIS Aerosols Over Land Increase as Confidence Flag Decreases *Compliant data are within + 0.05 + 0.2  Aeronet Statistics from Hyer, E., J. Reid, and J. Zhang, 2010, An over-land aerosol optical depth data set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 optical depth retrievals, Atmos. Meas. Tech. Discuss., 3, 4091–4167.

60 The effect of bad quality data is often not negligible Total Column Precipitable Water Quality BestGood Do Not Use kg/m 2 Hurricane Ike, 9/10/2008

61 DQSS replaces bad-quality pixels with fill values Mask based on user criteria (Quality level < 2) Good quality data pixels retained Output file has the same format and structure as the input file (except for extra mask and original_data fields) Original data array (Total column precipitable water)

62 AeroStat? Different papers provide different views on whether MODIS and MISR measure aerosols well. Peer-reviewed papers usually are well behind the latest version of the data. It is difficult to verify results of a published paper and resolve controversies between different groups as it is difficult to reproduce the results - they might have dealt with either different data or used different quality controls or flags. It is important to have an online shareable environment where data processing and analysis can be done in a transparent way by any user of this environment and can be shared amongst all the members of the aerosol community. 2/18/2011 62

63 Monthly AOD Standard deviation Areas with high AOD standard deviation might point to areas of high uncertainty. Next slide shows these areas on map of mean AOD.

64 AeroStat: Online Platform for the Statistical Intercomparison of Aerosols 64 Explore & Visualize Level 3 Compare Level 3 Correct Level 2 Compare Level 2 Before and After Merge Level 2 to new Level 3 Level 3 are too aggregated Switch to high-res Level 2 1/8/2016 Explore & Visualize Level 2 64 EGU 2011

65 Types of Bias Correction Type of Correction Spatial Basis Tempora l Basis ProsCons Relative (Cross- sensor) linear Climatological RegionSeasonNot influenced by data in other regions, good sampling Difficult to validate Relative (Cross- sensor) non- linear Climatological GlobalFull data record Complete samplingDifficult to validate Anchored Parameterized Linear Near Aeronet stations Full data record Can be validatedLimited areal sampling Anchored Parameterized Non-Linear Near Aeronet stations Full data record Can be validatedLimited insight into correction 2/18/2011 65

66 Data Quality Issues Validation of aerosol data show that not all data pixels labeled as “bad” are actually bad if looking at from a bias perspective. But many pixels are biased differently due to various reasons From Levy et al, 2009

67 Quality & Bias assessment using FreeMind from the Aerosol Parameter Ontology FreeMind allows capturing various relations between various aspects of aerosol measurements, algorithms, conditions, validation, etc. The “traditional” worksheets do not support complex multi-dimensional nature of the task

68 Reference: Hyer, E. J., Reid, J. S., and Zhang, J., 2011: An over-land aerosol optical depth data set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 optical depth retrievals, Atmos. Meas. Tech., 4, 379-408, doi:10.5194/amt-4-379-2011 Title: MODIS Terra C5 AOD vs. Aeronet during Aug-Oct Biomass burning in Central Brazil, South America (General) Statement: Collection 5 MODIS AOD at 550 nm during Aug- Oct over Central South America highly over-estimates for large AOD and in non-burning season underestimates for small AOD, as compared to Aeronet; good comparisons are found at moderate AOD. Region & season characteristics: Central region of Brazil is mix of forest, cerrado, and pasture and known to have low AOD most of the year except during biomass burning season (Example) : Scatter plot of MODIS AOD and AOD at 550 nm vs. Aeronet from ref. (Hyer et al, 2011) (Description Caption) shows severe over-estimation of MODIS Col 5 AOD (dark target algorithm) at large AOD at 550 nm during Aug- Oct 2005-2008 over Brazil. (Constraints) Only best quality of MODIS data (Quality =3 ) used. Data with scattering angle > 170 deg excluded. (Symbols) Red Lines define regions of Expected Error (EE), Green is the fitted slope Results: Tolerance= 62% within EE; RMSE=0.212 ; r2=0.81; Slope=1.00 For Low AOD ( 1.4) Slope=1.54 (Dominating factors leading to Aerosol Estimate bias): 1.Large positive bias in AOD estimate during biomass burning season may be due to wrong assignment of Aerosol absorbing characteristics. (Specific explanation) a constant Single Scattering Albedo ~ 0.91 is assigned for all seasons, while the true value is closer to ~0.92-0.93. [ Notes or exceptions: Biomass burning regions in Southern Africa do not show as large positive bias as in this case, it may be due to different optical characteristics or single scattering albedo of smoke particles, Aeronet observations of SSA confirm this] 2. Low AOD is common in non burning season. In Low AOD cases, biases are highly dependent on lower boundary conditions. In general a negative bias is found due to uncertainty in Surface Reflectance Characterization which dominates if signal from atmospheric aerosol is low. 0 1 2 Aeronet AOD Central South America * Mato Grosso * Santa Cruz * Alta Floresta

69 AeroStat Ontology 69

70 Completeness: Observing Conditions for MODIS AOD at 550 nm Over Ocean RegionEcosystem% of Retrieval Within Expected Error Average Aeronet AOD AOD Estimation Relative to Aeronet US Atlantic Ocean Dominated by Fine mode aerosols (smoke & sulfate) 72%0.15Over- estimated (by 7%) * Indian Ocean Dominated by Fine mode aerosols (smoke & sulfate) 64 %0.16Over- estimated (by 7% ) * Asian Pacific Oceans Dominated by fine aerosol, not dust 56%0.21Over-estimated (by 13%) “Saharan” Ocean Outflow Regions in Atlantic dominated by Dust in Spring 56%0.31Random Bias (1%) * Mediterranean Dominated by fine aerosol 57%0.23Under- estimated (by 6% ) * *Remer L. A. et al., 2005: The MODIS Aerosol Algorithm, Products and Validation. Journal of the Atmospheric Sciences, Special Section. 62, 947-973.

71 Completeness: Observing Conditions for MODIS AOD at 550 nm Over Land RegionEcosystem% of Retrieval Within Expected Error Correlation W.r.t Chinese ground Sun Hazetometer AOD Estimation Relative to Ground based sensor Yanting, ChinaAgriculture Site (central China) 45%slope=1.04 ; offset= -0.063 Corr ^2 = 0.83 Slightly Over- estimated Fukung, ChinaSemi Desert (North West China) Site 7%slope=1.65 offset=0.074 Corr ^2 = 0.58 Over- estimated (more than 100% at large AOD values BeijingUrban Site Industrial Pollution 35%Slope = 0.38, Offset = 0.086, Corr ^2 = 0.46% Severely Under- estimated (more than 100% at large AOD values) * Li Z. et al, 2007: Validation and understanding of Moderate Resolution Imaging Spectroradiometer aerosol products (C5) using ground-based measurements from the handheld Sun photometer network in China, JGR, VOL. 112, D22S07, doi:10.1029/2007JD008479.

72 Summary Quality is very hard to characterize, different groups will focus on different and inconsistent measures of quality –Modern ontology representations to the rescue! Products with known Quality (whether good or bad quality) are more valuable than products with unknown Quality. –Known quality helps you correctly assess fitness-for-use Harmonization of data quality is even more difficult that characterizing quality of a single data product 72

73 Summary Advisory Report is not a replacement for proper analysis planning –But provides benefit for all user types summarizing general fitness-for-usage, integrability, and data usage caveat information –Science user feedback has been very positive Provenance trace dumps are difficult to read, especially to non-software engineers –Science user feedback; “Too much information in provenance lineage, I need a simplified abstraction/view” Transparency  Translucency –make the important stuff stand out Uncertainty and bias are evident at may levels!! 73

74 Future Work Advisor suggestions to correct for potential anomalies Views/abstractions of provenance based on specific user group requirements Continued iteration on visualization tools based on user requirements Present a comparability index / research techniques to quantify comparability 74

75 Your task when reviewing this.. Pull out all the informatics principles used… 75

76 Reading for this week None 76

77 What is next Week 11 – TBD Week 12 – guest lectures and written part of final project due Week 13 – project presentations 77


Download ppt "1 Peter Fox Xinformatics Week 10, April 2, 2012 Worked example."

Similar presentations


Ads by Google