Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Data Quality Screening Service for Remote Sensing Data Christopher Lynnes, NASA/GSFC (P.I.) Edward Olsen, NASA/JPL Peter Fox, RPI Bruce Vollmer, NASA/GSFC.

Similar presentations

Presentation on theme: "A Data Quality Screening Service for Remote Sensing Data Christopher Lynnes, NASA/GSFC (P.I.) Edward Olsen, NASA/JPL Peter Fox, RPI Bruce Vollmer, NASA/GSFC."— Presentation transcript:

1 A Data Quality Screening Service for Remote Sensing Data Christopher Lynnes, NASA/GSFC (P.I.) Edward Olsen, NASA/JPL Peter Fox, RPI Bruce Vollmer, NASA/GSFC Robert Wolfe, NASA/GSFC Contributions from R. Strub, T. Hearty, Y-I Won, M. Hegde, V. Jayanti S. Zednik, P. West, N. Most, S. Ahmad, C. Praderas, K. Horrocks, I. Tcherednitchenko, and A. Rezaiyan-Nojani Advancing Collaborative Connections for Earth System Science (ACCESS) Program

2 Outline  Why a Data Quality Screening Service?  Making Quality Information Easier to Use via the Data Quality Screening Service (DQSS)  DEMO  DQSS Architecture  DQSS Accomplishments  DQSS Going Forward  The Complexity of Data Quality 2

3 Why a Data Quality Screening Service? 3

4 The quality of data can vary considerablyThe quality of data can vary considerably AIRS VariableBest (%) Good (%) Do Not Use (%) Total Precipitable Water38 24 Carbon Monoxide64729 Surface Temperature54451 Version 5 Level 2 Standard Retrieval Statistics 4

5 Quality schemes can be relatively simple…Quality schemes can be relatively simple… Total Column Precipitable WaterQual_H2O BestGoodDo Not Use kg/m 2 5

6 …or they can be more complicated…or they can be more complicated Hurricane Ike, viewed by the Atmospheric Infrared Sounder (AIRS) PBest : Maximum pressure for which quality value is “Best” in temperature profiles Air Temperature at 300 mbar 6 300 mbar

7 Quality flags are also sometimes packed together into bytes Cloud Mask Status Flag 0=Undetermined 1=Determined Cloud Mask Cloudiness Flag 0=Confident cloudy 1=Probably cloudy 2=Probably clear 3=Confident clear Day / Night Flag 0=Night 1=Day Sunglint Flag 0=Yes 1=No Snow/ Ice Flag 0=Yes 1=No Surface Type Flag 0=Ocean, deep lake/river 1=Coast, shallow lake/river 2=Desert 3=Land Big-endian arrangement for the Cloud_Mask_SDS variable in atmospheric products from Moderate Resolution Imaging Spectroradiometer (MODIS) 7

8 Recommendations for quality filtering can be confusing AIRS Quality Indicators MODIS Aerosols Confidence Flags Data Assim. Best 0 Climatic Studies Good 1 Do Not Use Do Not Use 2 Use these flags in order to stay within expected error bounds 3 Very Good 2 Good 1 Marginal 0 Poor 3 Very Good 2 Good 1 Marginal 0 Poor Ocean Land ±0.05 ± 0.15  ±0.03 ± 0.10  Ocean Land Different framing for recommendations Opposite direction for QC codes

9 Current user scenarios...Current user scenarios...  Nominal scenario  Search for and download data  Locate documentation on handling quality  Read & understand documentation on quality  Write custom routine to filter out bad pixels  Equally likely scenario ( especially in user communities not familiar with satellite data )  Search for and download data  Assume that quality has a negligible effect Repeat for each user 9

10 The effect of bad quality data is often not negligible Total Column Precipitable Water Quality BestGood Do Not Use kg/m 2 Hurricane Ike, 9/10/2008 10

11 Neglecting quality may introduce bias (a more subtle effect) AIRS Relative Humidity Comparison against Dropsonde with and without Applying PBest Quality Flag Filtering Boxed data points indicate AIRS Relative Humidity data with dry bias > 20% From a study by Sun Wong (JPL) on specific humidity in the Atlantic Main Development Region for Tropical Storms 11 without QC filtering with QC filtering

12 Percent of Biased-High Data in MODIS Aerosols Over Land Increases as Confidence Flag Decreases *Compliant data are within + 0.05 + 0.2  Aeronet Statistics derived from Hyer, E., J. Reid, and J. Zhang, 2010, An over-land aerosol optical depth data set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 optical depth retrievals, Atmos. Meas. Tech., 3, 4091–4167. 12 Poor

13 Making Quality Information Easier to Use via the Data Quality Screening Service (DQSS) 13

14 DQSS TeamDQSS Team  P.I.: Christopher Lynnes  Software Implementation: Goddard Earth Sciences Data and Information Services Center  Implementation: Richard Strub  Local Domain Experts (AIRS): Thomas Hearty and Bruce Vollmer  AIRS Domain Expert: Edward Olsen, AIRS/JPL  MODIS Implementation  Implementation: Neal Most, Ali Rezaiyan, Cid Praderas, Karen Horrocks, Ivan Tcherednitchenko  Domain Experts: Robert Wolfe and Suraiya Ahmad  Semantic Engineering: Tetherless World Constellation @ RPI  Peter Fox, Stephan Zednik, Patrick West 14

15 The DQSS filters out bad pixels for the user  Default user scenario  Search for data  Select science team recommendation for quality screening (filtering)  Download screened data  More advanced scenario  Search for data  Select custom quality screening options  Download screened data 15

16 DQSS replaces bad-quality pixels with fill values Mask based on user criteria (Quality level < 1*) Good quality data pixels retained Output file has the same format and structure as the input file (except for extra mask and original_data fields) Original data array (Total column precipitable water) 16 *0 = Best

17 Visualizations help users see the effect of different quality filtersVisualizations help users see the effect of different quality filters 17

18 DQSS can encode the science team recommendations on quality screening  AIRS Level 2 Standard Product  Use only Best for data assimilation uses  Use Best+Good for climatic studies  MODIS Aerosols  Use only VeryGood (highest value) over land  Use Marginal+Good+VeryGood over ocean 18

19 Initial settings are based on Science Team recommendation. (Note: “Good” retains retrievals that are Good or better). You can choose settings for all parameters at once......or variable by variable Or, users can select their own criteria...Or, users can select their own criteria... 19

20 DEMO   Select Terra or Aqua Level 2 Atmosphere  Select “51 - Collection 5.1”!  Add results to Shopping Cart (do not go directly to “Order Now”)  Go to shopping cart and ask to “Post-Process and Order”   (Search for ‘AIRX2RET’) 20

21 DQSS ArchitectureDQSS Architecture 21

22 DQSS FlowDQSS Flow user selection Screener Quality Ontology data file quality mask screened data file End User data file w/ mask Masker Ontology Query 22

23 DQSS Ontology (The Whole Enchilada) 23 Data Field Binding Module Data Field Semantics Module Quality View Module

24 DQSS Quality View ZoomDQSS Quality View Zoom 24

25 DQSS encapsulates ontology and screening parameters MODAPS Minions LAAD SWeb GUI DQSS Ontology Service data product screening options MODAPS Database screening recipe (XML) selected options screening recipe (XML) screening parameters (XML) Encapsulation enables reuse in other data centers with diverse environments. 25

26 DQSS AccomplishmentsDQSS Accomplishments  DQSS is operational at two EOSDIS data centers: TRL = 9  MODIS L2 Aerosols  MLS L2 Water Vapor  AIRS Level 2 Retrievals  MODIS L2 Water Vapor  Metrics are being collected routinely  DQSS is almost released (All known paperwork filed) 26

27 Papers and PresentationsPapers and Presentations  Managing Data Quality for Collaborative Science workshop (peer-reviewed paper + talk)  Sounder Science meeting talk  ESDSWG Poster  ESIP Poster  A-Train User Workshop (part of GES DISC presentation)  AGU: Ambiguity of Data Quality in Remote Sensing Data 27

28 Metrics  DQSS is included in web access logs sent to EOSDIS Metrics System (EMS)  Tagged with “Protocol” DQSS  EMS  MCT (ESDS Metrics) bridge implemented for both DAACs*  Metrics from EMS:  Users: 208  Downloads: 73,452  But still mostly AIRS, some MLS  MODIS L2 Aerosol may be the “breakthrough” product 28 *DQSS is the first multi-DAAC bridge for ACCESS

29 ESDSWG Participation by DQSSESDSWG Participation by DQSS  Active participant (Lynnes) in Technology Infusion Working Group  Subgroups: Semantic Web, Interoperability and Processes and Strategies  Established and led two ESIP clusters as technology infusion activities  Federated Search  Discovery  Earth Science Collaboratory  Participated in ESDSWG reorganization tiger team  Founded ESDSWG colloquium series 29

30 DQSS Spinoff BenefitsDQSS Spinoff Benefits  DQSS staff unearthed a subtle error in the MODIS Dark Target Aerosols algorithm  Quality Confidence flags are sometimes set to Very Good, Good or Marginal when they should be marked as “Bad”  Fixed in Version 6 of the algorithm  Also fixed in Aerostat ACCESS project  Semantic Web technology infusion in DQSS enables future infusion of the RPI component of Multi-Sensor Data Synergy Advisor into the Atmospheric Composition Portal  Shared skills, shared language, shared tools  The Earth Science Collaboratory concept was born in a discussion with Kuo about DQSS in the ESDSWG Poster Session in New Orleans. 30

31 Going ForwardGoing Forward  Maintain software as part of GES DISC core services  Modify as necessary for new releases  New data product versions have simpler quality schemes  Add data products as part of data support  Sustainability Test: How long to add a product?  Answer: a couple days for MLS Ozone  Continue outreach  Demonstrated to scientists in NASA ARSET  Release Software and Technology 31

32 DQSS RecapDQSS Recap  Screening satellite data can be difficult and time consuming for users  The Data Quality Screening System provides an easy-to- use service  The result should be:  More attention to quality on users’ part  More accurate handling of quality information…  …With less user effort 32

33 The Complexity of Data Quality Part 1  Quality Control ≠ Quality Assessment  QC represents algorithm’s “happiness” with an individual retrieval or profile at “pixel” level  QA is science team’s statistical assessment of quality at the product level, based on cal / val campaigns 33 Algorithm version N Data Product version N Science Team Algorithm version N+1 Data Product version N+1 Science Team QC guessquality assessment improved QC estimation QC guess quality assessment

34 QC vs. QA – User ViewQC vs. QA – User View Give me just the good- quality data values Tell me how good the dataset is Data Provider Quality Control Quality Assessment

35 Quality (Priority) is in the Eye of the BeholderQuality (Priority) is in the Eye of the Beholder  Climate researchers: long-term consistency, temporal coverage, spatial coverage  Algorithm developers: accuracy of retrievals, information content  Data assimilators : spatial consistency  Applications: latency, spatial resolution  Education/Outreach: usability, simplicity 35

36 Recommendation 1: Harmonize Quality Terms  Start with ISO 19115/19157 Data Quality model, but... Q: Examples of “Temporal Consistency” quality issues? ISO: Temporal Consistency=“correctness of the order of events” Land Surface Temperature anomaly from Advanced Very High Resolution Radiometer trend artifact from orbital drift discontinuity artifact from change in satellites

37 Recommendation 2: Address More Dimensions of Quality  Accuracy: measurement bias + dispersion  Accuracy of data with low-quality flags  Accuracy of grid cell aggregations  Consistency:  Spatial  Temporal  Observing conditions

38 Recommendation #2 (cont.) More dimensions of quality  Completeness  Temporal: Time range, diurnal coverage, revisit frequency  Spatial: Coverage and Grid Cell Representativeness  Observing conditions: cloudiness, surface reflectance  N.B.: Incompleteness affects accuracy via sampling bias  AIRS dry sampling bias at high latitudes due to incompleteness in high-cloud conditions, which tend to have rain and convective clouds with icy tops  AIRS wet sampling bias where low clouds are prevalent 38

39 Quality Indicator: AOD spatial completeness (coverage) MODIS Aqua MISR Due to a wider swath, MODIS AOD covers more area than MISR. The seasonal and zonal patterns are rather similar. Average Percentage of Non-Fill Values in Daily Gridded Products

40 Quality Indicator: Diurnal Coverage for MODIS Terra in summer 40 Local Hour of Day Probability (%) of an “Overpass*” in a Given Day *That is, being included in a MODIS L2 pixel during that hour of the day Because Terra is sun-synchronous with a 10:30 equator crossing, observations are limited to a short period during the day at all but high latitudes (Arctic in summer).

41 Recommendation 3: Address Fitness for Purpose Directly*  Standardize terms of recommendation  Enumerate more positive realms and examples  Enumerate negative examples *A bit controversial

42 Recommendation #4: More Outreach...Recommendation #4: More Outreach... ...Once we know what we want to say  Quality Screening: no longer any excuse not to for DQSS- served datasets  Should NASA supply DQSS for more datasets?  Quality Assessment: dependent on Recommendations 1-3  Venues  Science team meetings...but for QC this is “preaching to the choir”  NASA Applied Remote Sensing Education and Training (ARSET): get them at the start of their NASA data usage!  Demonstrated DQSS and Aerostat to ARSET  Any others like ARSET?  Training workshops? 42

43 Backup SlidesBackup Slides 43

44 Lessons LearnedLessons Learned  The tall pole is acquiring knowledge about what QC indicators really mean, and how they should be used.  Is anyone out there actually using the quality documentation?  Detailed analysis by DQSS of quality documentation turned up errors, ambiguities, inconsistencies, even an algorithm bug  Yet the help desks are not getting questions about these...  Seemed like a good idea at the time...  Visualizations: good for validating screening, but do users use them?  Hdf-java: not quite the panacea for portability we hoped  DQSS – OPeNDAP gateway:  OPeNDAP access to DQSS-screened data enables use by IDV /McIDAS- V  But it’s hard to integrate into a search interface  (Not giving up yet...) 44

45 Possible Client Side ConceptPossible Client Side Concept Ontology Web Form Data Provider Masker Screener dataset-specific instance info data files screening criteria XML File data files GES DISCEnd User 45

46 DQSS Target Uses and UsersDQSS Target Uses and Users Routine Visual- ization Quick Recon. Machine- levelMetrics InterdisciplinaryXX EducationalXX Expert/Power? XX Applications X Algorithm Developers X 46

47 ESDSWG ActivitiesESDSWG Activities  Contributions to all 3 TIWG subgroups  Semantic Web:  Contributed Use Case for Semantic Web tutorial @ ESIP  Participation in ESIP Information Quality Cluster  Services Interoperability and Orchestration  Combining ESIP OpenSearch with servicecasting and datacasting standards  Spearheading ESIP Federated OpenSearch cluster  5 servers (2 more soon); 4+ clients  Now ESIP Discover cluster (see above)  Processes and Strategies:  Developed Use Cases for Decadal Survey Missions: DESDynI-Lidar, SMAP, and CLARREO 47

48 File-Level Quality Statistics are not always useful for data selection Study Area Percent Cloud Cover?

49 Level 3 grid cell standard deviation is difficult to interpret due to its dependence on magnitude Mean Standard Deviation MODIS Aerosol Optical Thickness at 550 nm

50 Neither pixel count nor standard deviation alone express how representative the grid cell value is MODIS Aerosol Optical Thickness at 0.55 microns Level 3 Grid AOT Mean Level 2 Swath Level 3 Grid AOT Standard Deviation Level 3 Grid AOT Input Pixel Count 0 1 2 0 10.5 1 122

Download ppt "A Data Quality Screening Service for Remote Sensing Data Christopher Lynnes, NASA/GSFC (P.I.) Edward Olsen, NASA/JPL Peter Fox, RPI Bruce Vollmer, NASA/GSFC."

Similar presentations

Ads by Google