A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George.

Slides:



Advertisements
Similar presentations
Literature Review Kathryn Westerman Oliver Smith Enrique Hernandez Megan Fowler.
Advertisements

Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
The PRISM Approach to Mapping Climate in Complex Regions Christopher Daly Director Spatial Climate Analysis Service Oregon State University Corvallis,
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Fighting the Great Challenges in Large-scale Environmental Modelling I. Dimov n Great challenges in environmental modelling n Impact of climatic changes.
2nd Day: Bear Example Length (in) Weight (lb)
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Objectives (BPS chapter 24)
Elsa Nickl and Cort Willmott Department of Geography
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
For the Lesson: Eta Characteristics, Biases, and Usage December 1998 ETA-32 MODEL CHARACTERISTICS.
9. SIMPLE LINEAR REGESSION AND CORRELATION
GIS and Drought Applications Keith Stellman Senior Hydrologist Lower Mississippi River Forecast Center Slidell, LA.
The Introduction of a Knowledge-based Approach and Statistical Methods to Make GIS-Compatible Climate Map Goshi Fujimoto at MIG seminar on 16th, May Daly,
MOS Performance MOS significantly improves on the skill of model output. National Weather Service verification statistics have shown a narrowing gap between.
Correlation and Regression Analysis
Relationships Among Variables
1/49 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 9 Estimation: Additional Topics.
NCPP – needs, process components, structure of scientific climate impacts study approach, etc.
Inference for regression - Simple linear regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
Regional Climate Modeling in the Source Region of Yellow River with complex topography using the RegCM3: Model validation Pinhong Hui, Jianping Tang School.
National Water and Climate Center PRISM Probabilistic-Spatial QC (PSQC) System for SNOTEL Data MTNCLIM 2006 CONFERENCE September 19-22, 2006 at Timberline.
Simple Linear Regression Models
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
PRISM Approach to Producing Analysis of Record Christopher Daly, Ph.D., Director Spatial Climate Analysis Service Oregon State University Corvallis, Oregon,
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Ag Weather Net Founded 2004 Funded by the Western Region IPM center Workgroup Program.
Prism Climate Group Oregon State University Christopher Daly Director Based on presentation developed Dr. Daly “Geospatial Climatology” as an emerging.
Development of a 103-Year High- Resolution Climate Data Set for the Conterminous United States Wayne Gibson 1, Christopher Daly 1, Tim Kittel 2, Doug Nychka.
Mapping the Thermal Climate of the HJ Andrews Experimental Forest, Oregon Jonathan Smith Dept. of Geosciences Oregon State University PHOTOS COURTESY OF.
Development and evaluation of Passive Microwave SWE retrieval equations for mountainous area Naoki Mizukami.
GEOSTATISICAL ANALYSIS Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257.
Gridded Rainfall Estimation for Distributed Modeling in Western Mountainous Areas 1. Introduction Estimation of precipitation in mountainous areas continues.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
MODSCAG fractional snow covered area (fSCA )for central and southern Sierra Nevada Spatial distribution of snow water equivalent across the central and.
Principal aspects taken into account in PRISM model: 1.Relationship between precipitation and elevation: Precipitation increases with elevation, with a.
Quality control of daily data on example of Central European series of air temperature, relative humidity and precipitation P. Štěpánek (1), P. Zahradníček.
Spatial Interpolation Chapter 13. Introduction Land surface in Chapter 13 Land surface in Chapter 13 Also a non-existing surface, but visualized as a.
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
American Association of State Climatologists, Coeur d’ Alene, ID 18 July, 2007 Update Since Rapid City Jan Curtis Applied Climatologist National Water.
Spatial distribution of snow water equivalent across the central and southern Sierra Nevada Roger Bales, Robert Rice, Xiande Meng Sierra Nevada Research.
Statistics Presentation Ch En 475 Unit Operations.
Adjustment of Global Gridded Precipitation for Orographic Effects Jennifer C. Adam 1 Elizabeth A. Clark 1 Dennis P. Lettenmaier 1 Eric F. Wood 2 1.Dept.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Hydrologic Forecasting With Statistical Models Angus Goodbody David Garen USDA Natural Resources Conservation Service National Water and Climate Center.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
NWS Calibration Workshop, LMRFC March, 2009 slide - 1 Analysis of Temperature Basic Calibration Workshop March 10-13, 2009 LMRFC.
Snotel QC Website Tutorial Matt Doggett SCAS Updated: 21 Mar 2005.
Evapotranspiration Estimates over Canada based on Observed, GR2 and NARR forcings Korolevich, V., Fernandes, R., Wang, S., Simic, A., Gong, F. Natural.
NDFDClimate: A Computer Application for the National Digital Forecast Database Christopher Mello WFO Cleveland.
Statistics Presentation Ch En 475 Unit Operations.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Mapping the Thermal Climate of the HJ Andrews Experimental Forest, Oregon Jonathan Smith Spatial Climate Analysis Service Oregon State University Corvallis,
BPS - 5th Ed. Chapter 231 Inference for Regression.
Drought Through a PRISM: Precipitation Mapping and Analysis Activities at the PRISM Group Christopher Daly, Director PRISM Group Assoc. Prof., Dept. of.
Application of Probability Density Function - Optimal Interpolation in Hourly Gauge-Satellite Merged Precipitation Analysis over China Yan Shen, Yang Pan,
PRISM Climate Mapping in Alaska Christopher Daly Director, PRISM Climate Group College of Engineering Oregon State University October 2016.
of Temperature in the San Francisco Bay Area
U.S.-India Partnership for Climate Resilience
mesoscale climate dynamics
The PRISM Approach to Mapping Climate in Complex Regions
Basic Practice of Statistics - 3rd Edition Inference for Regression
Presentation transcript:

A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George Taylor Spatial Climate Analysis Service Oregon State University Corvallis, Oregon, USA

Traditional QC Systems are Categorical and Deterministic Data subjected to categorical quality checksData subjected to categorical quality checks –Designed to uncover mistakes Validity determined from test resultsValidity determined from test results –Mistake = flag / toss –No mistake = no flag / keep Designed to Work With Human Observing Systems

Alien Electronic Devices are Invading the Climate Observing World! They’re Everywhere!

Electronic Sensors and Modern Applications Create Challenges for Traditional QC Systems Errors tend to be continuous drift, rather than categorical mistakesErrors tend to be continuous drift, rather than categorical mistakes Increasing usage of computer applications that rely on climate observationsIncreasing usage of computer applications that rely on climate observations Continuous estimates, rather than categorical tests, of observation validityContinuous estimates, rather than categorical tests, of observation validity Quantitative estimates of observational uncertainty, not just flagsQuantitative estimates of observational uncertainty, not just flags SituationNeed

More Challenges… Range of applications is increasingly rapidly, and each has a difference tolerance for outliersRange of applications is increasingly rapidly, and each has a difference tolerance for outliers Data are often more voluminous and disseminated in a more timely mannerData are often more voluminous and disseminated in a more timely manner Probabilistic information from which a decision to use an obs can be made, not up- front decisionProbabilistic information from which a decision to use an obs can be made, not up- front decision Automated QC methodsAutomated QC methods SituationNeed

An Opportunity Advances in climate mapping technology now make it possible to estimate a reasonably accurate “expected value” for an observation based on surrounding stations. Assumption: Spatial consistency is related to observation validity

Useful Characteristics for a Next-Generation Climate QC System continuousquantitativeprobabilisticautomatedspatial

PRISM Probabilistic-Spatial QC (PSQC) System for SNOTEL Data Uses climate mapping technology and climate statistics to provide a continuous, quantitative confidence probability for each observation, estimate a replacement value, and provide a confidence interval for that replacement. Start with daily max/min temperature for all SNOTEL sites, period of record Move to precipitation, SWE, soil temperature and moisture Develop automated system for near-real time operation at NRCS

Climatological Grid Development –PRISM must produce a high-quality estimate of temperature at each SNOTEL station each day –Highest interpolation skill obtained by using a high-quality predictive grid that represents the long-term climatological temperature for that day, rather than a digital elevation grid –Climatological grid: 0.8 km resolution, km 0.8 km

Oregon Annual Precipitation Leveraging Information Content of High-Quality Climatologies to Create New Maps with Fewer Data and Less Effort Climatology used in place of DEM as PRISM predictor grid

PRISM Regression of “Weather vs Climate” 20 July 2000 Tmax vs Mean July Tmax

-Generates gridded estimates of climatic parameters -Moving-window regression of climate vs. elevation for each grid cell -Uses nearby station observations - Spatial climate knowledge base (KBS) weights stations in the regression function by their climatological similarity to the target grid cell PRISM Parameter-elevation Regressions on Independent Slopes Model

PRISM KBS accounts for spatial variations in climate due to: -Elevation -Terrain orientation -Terrain steepness -Moisture regime -Coastal proximity -Inversion layer -Long-term climate patterns PRISM Parameter-elevation Regressions on Independent Slopes Model

PRISM Moving-Window Regression Function Mean April Precipitation, Qin Ling Mountains, China Weighted linear regression

Rain Shadows: Mean Annual Precipitation Oregon Cascades Portland Eugene Sisters Redmond Bend Mt. Hood Mt. Jefferson Three Sisters N 350 mm/yr 2200 mm/yr 2500 mm/yr Dominant PRISM KBS Components Elevation Terrain orientation Terrain steepness Moisture Regime

Mean Annual Precipitation, Cascade Mtns, OR, USA

Coastal Effects: July Maximum Temperature Central California Coast Monterey San Francisco San Jose Santa Cruz Hollister Salinas Stockton Sacramento Pacific Ocean Fremont N Preferred Trajectories Dominant PRISM KBS Components Elevation Coastal Proximity Inversion Layer 34 ° 20 ° 27 ° Oakland

Inversions – July Minimum Temperature Northwestern California Ukiah CloverdaleLakeport Willits Clear Lake Pacific Ocean Lake Pilsbury. N Dominant PRISM KBS Components Elevation Inversion Layer Topographic Index Coastal Proximity 12 ° 17 ° 9°9° 16 ° 10 ° 17 °

Definition of CP: Given the difference between an observation and an expected value (residual), CP is the probability that another observation and expected value from the same time of year would differ by at least as much Residual distribution +/- 15 day, +/- 2 year window = 5 yrs, 31 days each (N~155) PRISM PSQC System Confidence Probability (CP)

Confidence Probability Takes into Account Uncertainty in the System P-value is higher for a given deviation from the mean when S x is large (low skill) X = Residual (P-O) Low Overall SkillHigh Overall Skill

Interpreting Confidence Probability Continuous values from 0 – 100% 0% = highly spatially inconsistent observation, reflected in a PRISM prediction that is unusually different than the observation 100% = highly consistent observation, reflected in a PRISM prediction that is relatively close to the observation Guidelines to date CP > 30: Use observation as-is 10 < CP < 30: Blend prediction and observation CP < 10: Use prediction instead of observation

PRISM PSQC Process 1. Create Database Records Goal: Enter daily tmax/tmin observations for all networks into database and prepare data Current Actions: 1.Ingest daily tmin/tmax observations from SNOTEL, COOP, RAWS, Agrimet, ASOS, and first-order networks. 2.Shift AM COOP observations of tmax to previous day (assumes standard diurnal curve, which does not always apply). 3.Convert units to degrees Celsius.

PRISM PSQC Process 2. Single-Station Checks Goal: Take all QC actions possible at the single-station level, before entering the spatial QC process. Current Checks: 1.Temperature observation is well above the all-time record maximum or well below the all-time record minimum for the state – flag set and CP set to 0 2.Maximum temperature is less than the minimum temperature – flag set and CP set to 0 3.First daily tmax/tmin observation after a period of missing data – flag set and CP set to 0 (COOP only?) 4.More than 10 consecutive observations with the same value (<+/-1F COOP, <+/-0.1C others), or more than 5 consecutive zero values, is a definite flatliner – flag set and CP set to consecutive observations with the same value is a potential flatliner, to be assessed by the spatial QC system – flag set and CP unchanged

PRISM PSQC Process 3. Spatial QC System Goal : Through a series of iterations, gradually and systematically “weed out” spatially inconsistent observations from consistent ones Overview: 1.PRISM is run for each station location for each day, and summary statistics are accumulated 2.Once all days have been run, frequency distributions are developed and confidence probabilities (CP) for each daily station observation are estimated 3.These CP values are used to weight the daily observations in a second iteration of PRISM daily runs 4.Obs with lower CP values are given lower weight, and thus have less influence, in the second set of PRISM predictions, and are also given lower weight in the calculation of the second set of summary statistics 5.CP values are again calculated and passed back to the daily PRISM runs 6.This iterative process continues for about 5 iterations, at which time the CP values have reached equilibrium

QC Iteration For each station-day: Run PRISM for each station location in its absence, estimating its obs for each dayRun PRISM for each station location in its absence, estimating its obs for each day PRISM omits nearby stations, singly, and in pairs, to try to better match observationPRISM omits nearby stations, singly, and in pairs, to try to better match observation Prediction closest to obs is acceptedPrediction closest to obs is accepted –Raw PRISM variables: –Raw PRISM variables: Observation (O), Prediction (P), Residual (R=P-O), PRISM Regression Standard Deviation (S) Once all station-days are run: Calculate summary statistics for each station for each day – Mean and std dev of O (Os), P (Ps), R (Rs), and S (Ss) –+/- 15 day, +/- 2 year window = 5 yrs, 31 days each (N~155) –5-day running Standard Deviation (RunSD) as a measure of day-to-day variability (time shifting) –Potential flatliners: calculate V, the ratio of station’s RunSD (set to 0.3) to that of surrounding stations Determine “effective” standard deviation for frequency distribution –Sigma = Max ( Rs, S, Ss, RunSD, 2 ) Calculate probability statistics for O, P, R, S, and V for each day –Probability statistics are p-values from z-tests –Residual Probability (RP) used as an estimate of overall Confidence Probability (CP) for an observation –Except in the case of potential flatliners, where CP = min(RP,VP) CP used to weight stations in next iteration

Observations and CP values, Date: Drifting sensor : MCKENZIE PASS (21E07S)

Climatology vs Observation and Prediction, Date: Drifting sensor : MCKENZIE PASS (21E07S)

Warm Bias: SALT CREEK FALLS (22F04S) Observations, Date:

Anomalies and CP values, 7-21 July 2000 Warm Bias: SALT CREEK FALLS (22F04S) 14 July

Scatter Plot: Climatology vs Observation, 14 July 2000 Warm Bias: SALT CREEK FALLS (22F04S) 22F04S Odell Lake COOP

Tmax Observations, Date:

Computing Obstacles Computing – currently takes about 60 hours to run PRISM PSQC system for SNOTEL sites in the western US –14-processor cluster Disk space – we now have > 1 TB, but will probably need more Funds are insufficient to “do it right”

Issues to Consider How far can the assumption be taken that spatial consistency equates with validity?How far can the assumption be taken that spatial consistency equates with validity? Are continuous and probabilistic QC systems useful for manual observing systems? Can a high-quality QC system ever be completely automated?