A Procedure for Automated Quality Control and Homogenization of historical daily temperature and precipitation data (APACH). Part 1: Quality Control of.

Slides:



Advertisements
Similar presentations
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
Kewen Xu Guixiang Zhao ( National Climate Observation Station of Taiyuan , No568,pingyang Road, Taiyuan , Shanxi,P.R.China ) Quality Control and.
Literature Review Kathryn Westerman Oliver Smith Enrique Hernandez Megan Fowler.
1 Alberta Agriculture and Food (AF) Surface Meteorological Stations and Data Quality Control Procedures.
Extreme precipitation Ethan Coffel. SREX Ch. 3 Low/medium confidence in heavy precip changes in most regions due to conflicting observations or lack of.
Developing the Self-Calibrating Palmer Drought Severity Index Is this computer science or climatology? Steve Goddard Computer Science & Engineering, UNL.
Relationships Between Eye Size and Intensity Changes of a N. Atlantic Hurricane Author: Stephen A. Kearney Mentor: Dr. Matthew Eastin, Central College.
Looking at data: distributions - Describing distributions with numbers IPS chapter 1.2 © 2006 W.H. Freeman and Company.
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Lecture 24: Thurs., April 8th
Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema.
1 NATS 101 Lecture 3 Climate and Weather. 2 Review and Missed Items Pressure and Height-Exponential Relationship Temperature Profiles and Atmospheric.
IPCC Model Classification and Regional Uncertainty Quantification in South America J.-P. Boulanger (1), L. Leggieri (2), A. Hannart (3), A. Rolla (4) and.
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India
1. Statistics 2. Frequency Table 3. Graphical Representations  Bar Chart, Pie Chart, and Histogram 4. Median and Quartiles 5. Box Plots 6. Interquartile.
Interannual and Regional Variability of Southern Ocean Snow on Sea Ice Thorsten Markus and Donald J. Cavalieri Goal: To investigate the regional and interannual.
StateDivision Mean Winter Temperature CT 1 - Northwest26.9 +/ Central29.5 +/ Coastal31.9 +/ MA 1 - Western24.9.
STAT 211 – 019 Dan Piett West Virginia University Lecture 2.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
Regional Climate Modeling in the Source Region of Yellow River with complex topography using the RegCM3: Model validation Pinhong Hui, Jianping Tang School.
Objectives 1.2 Describing distributions with numbers
B M K G Darman Mardanis, SE Stasiun Klimatologi Pondok Betung BMKG.
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
Nynke Hofstra and Mark New Oxford University Centre for the Environment Trends in extremes in the ENSEMBLES daily gridded observational datasets for Europe.
Section 12.3 Box-and-Whisker Plots Objectives: Find the range, quartiles, and interquartile range for a data set. Make a box and-whisker plot for a data.
Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Analyzing and Interpreting Quantitative Data
4.3 Diagnostic Checks VO Verallgemeinerte lineare Regressionsmodelle.
Development and evaluation of Passive Microwave SWE retrieval equations for mountainous area Naoki Mizukami.
6-1 Numerical Summaries Definition: Sample Mean.
Quasi-stationary planetary wave long-term changes in total ozone over Antarctica and Arctic A.Grytsai, O.Evtushevsky, O. Agapitov, A.Klekociuk, V.Lozitsky,
“Effects of Pacific Sea Surface Temperature (SST) Anomalies on the Climate of Southern South Carolina and Northern Coastal Georgia ” Whitney Albright Joseph.
Quality control of daily data on example of Central European series of air temperature, relative humidity and precipitation P. Štěpánek (1), P. Zahradníček.
The climate and climate variability of the wind power resource in the Great Lakes region of the United States Sharon Zhong 1 *, Xiuping Li 1, Xindi Bian.
Chapter 8 Making Sense of Data in Six Sigma and Lean
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
Chapter 3 Looking at Data: Distributions Chapter Three
Renata Gonçalves Tedeschi Alice Marlene Grimm Universidade Federal do Paraná, Curitiba, Paraná 1. OBJECTIVES 1)To asses the influence of ENSO on the frequency.
Figure 1. Map of study area. Heavy solid polygon defines “Cascade Mountains” for the purposes of this study. The thin solid line divides the Cascade Mountains.
NWS Calibration Workshop, LMRFC March, 2009 slide - 1 Analysis of Temperature Basic Calibration Workshop March 10-13, 2009 LMRFC.
Chapter 6: Analyzing and Interpreting Quantitative Data
Mike Dettinger USGS, La Jolla, CA DOWNSCALING to local climate.
Validation of Satellite-derived Clear-sky Atmospheric Temperature Inversions in the Arctic Yinghui Liu 1, Jeffrey R. Key 2, Axel Schweiger 3, Jennifer.
The ENSEMBLES high- resolution gridded daily observed dataset Malcolm Haylock, Phil Jones, Climatic Research Unit, UK WP5.1 team: KNMI, MeteoSwiss, Oxford.
1 Detection of discontinuities using an approach based on regression models and application to benchmark temperature by Lucie Vincent Climate Research.
N ational C limatic D ata C enter Development of the Global Historical Climatology Network Sea Level Pressure Data Set (Version 2) David Wuertz, Physical.
Data quality control for the ENSEMBLES grid Evelyn Zenklusen Michael Begert Christof Appenzeller Christian Häberli Mark Liniger Thomas Schlegel.
Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December am – 11 am Puan Hasmawati Binti Hassan
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Norwegian Meteorological Institute met.no QC2 Status
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
Quality Control of Soil Moisture and Temperature For US Climate Reference Network Basic Methodology February 2009 William Collins USCRN.
Essentials of Modern Business Statistics (7e)
Analyzing and Interpreting Quantitative Data
Correlation and Regression
Structural Business Statistics Data validation
Section 12.3 Box-and-Whisker Plots
Jianing Yu, David Ferster  Neuron 
NATS 101 Lecture 3 Climate and Weather
Constructing and Interpreting Visual Displays of Data
Diagnostics and Remedial Measures
Overview Exercise 1: Types of information Exercise 2: Seasonality
NATS 101 Lecture 3 Climate and Weather
Diagnostics and Remedial Measures
Relationships between species richness and temperature or latitude
B. L. Alterman & Justin C. Kasper July 18, 2019
Presentation transcript:

A Procedure for Automated Quality Control and Homogenization of historical daily temperature and precipitation data (APACH). Part 1: Quality Control of Argentinean daily data J.-P. Boulanger (1), J. Aizpuru (2), L. Leggieri (2) and M. Marino (3) (1) IRD, (2) FCEN/UBA, (3) SMN INTRODUCTION The quality-control method is based on a set of decision-tree algorithms analyzing separately precipitation and minimum and maximum temperature. All our tests are non-parametric. We applied the method to the historical daily database of the Argentine National Weather Service (Figs. 1 & 2). RESULTS The number of flagged temperature data followed the history of the National Weather Service fairly well. Indeed, before October 1967, when most of the data were filed on paper cards, lost or badly digitized due to humidity and dust and without quality control, our test detected a larger number of flagged data (especially Doubtful and Suspect; Fig. 4). During the period, we observed a minimum of detection made by the NWS manual quality control (Fig. 4). The percentage of doubtful or suspect data is larger in regions of higher density of the network (Fig. 5). The extreme precipitation test aimed at detecting very high precipitation values. The flagged data were more numerous before 1976 than later (Fig. 6). This result suggests that, although it is difficult to detect them, some erroneous high precipitation values may exist in the database, requiring thorough manual quality control of our flagged data. This process will be done in the future by the National Weather Service. The drought sequence test largely relies on neighbouring stations. As in the temperature case, more drought sequences (flagged as NeedCheck, Doubtful or Suspect) were found at the beginning of the period. A minimum was observed in the early 1970s before the Military Coup (1976). Since then, a weak positive trend in the number of flagged dry days (especially NeedCheck) has been detected. It is possible that the decrease in the number of active stations since 1976 (Fig. 2) has reduced the number of neighbouring stations used in our tests and thus increased the number of NeedCheck flags. Our method will be applied in the framework of the CLARIS LPB project. PRECIPITATION The procedure to control the quality of the daily precipitation data is based on two classes of tests (see Boulanger et al., 2008 for a description of these complex tests): 1- Extreme daily precipitation 2- Extreme dry sequences Fig. 6: Interannual variability of the number of potential outliers in daily precipitation Fig. 7: (upper panel) Seasonal cycle of the number of dry days classified as NeedCheck (straight line), Doubftul (dashed) and Suspect (dash-dot) cases. Doubtful and Suspect cases have been multiplied by 100 to be displayed on the same plot as the NeedCheck values ; (middle panel) Interannual variability of the percentage of NeedCheck (straight line), Doubftul (dashed) and Suspect (dash-dot) cases ; (lower panel) Interannual variability of the total percentage of the NeedCheck, Doubtful and Suspect cases. Fig. 5: Percentage of NeedCheck, Doubtful and Suspect values for minimum (upper panel) and maximum (lower panel) temperature (percentage computed relative to the total number of observed values). The symbols: Filled circles represent a percentage larger than 15% (NeedCheck), 0.1% (Doubtful) and 0.05% (Suspect). Circles represent a percentage larger than 5% (NeedCheck), 0.04% (Doubtful) and 0.01% (Suspect). Crosses represent a percentage smaller than 5% (NeedCheck), 0.04% (Doubtful) and 0.01% (Suspect). Fig. 4: Percentage of minimum (plain) and maximum (dash-dot) temperature observations classified as NeedCheck (top panel), Doubtful (middle panel) and Suspect (bottom panel). Fig.3: Daily minimum and maximum temperature decision-tree. All codes are explained in the tables to the right. TEMPERATURE We rescaled the data into percentiles computing distances to the 25 th (for lower values) or 75 th (for upper values) percentile (P25 or P75) as follows: Per(X)=(X-P25)/(P50-P25) if X<=P25 (25 th percentile) Per(X)=(X-P75)/(P75-P50) if X>= P75 (75 th percentile) Per(X)=0 otherwise. Our procedure to control the quality of minimum and maximum daily temperature is based on three classes of checks: 1.Consistency checks (single station; verify that T min < T max ) 2.Range, step and DIP checks based on station distribution (single station): - Outlier: OUT (T t )=Per(T t ) - Step: STEP (T t )=Per(T t -T t-1 ) - DIP: DIP(T t )=-STEP(T t )*STEP(T t+1 ), if (T t -T t-1 )* (T t+1 -T t ) < 0; DIP(T t )=0. 3.Spatial check based on neighbouring stations (spatial): - We pre-select neighbouring stations, located at less than 500km from the analyzed station, with a correlation larger than 0.8 (the correlation is computed only between data of the same month: January, February, … ). - For each pre-selected neighbouring station, we computed a linear regression between the analyzed and neighbour daily data. - Then we compute an interpolated value from the neighbours as: with 4.We then compute the difference between the analyzed and the interpolated value. The difference is normalized by the standard deviation of the daily temperature of the month of the day to check. The larger the distance, the stronger the confidence in stating that the data is erroneous. 5.Finally, we also computed the angle around the analyzed station covered by all the neighbours. In conclusion, for a given distance, the larger the angle covered by the neighbouring stations, the stronger the assertion that the data is Useful or Suspect. Similarly, for a given angle, the larger the distance, the stronger the assertion that the data is Useful or Suspect. Figure 2: Evolution of the weather station network during the period. Stations with less than 10% of missing data are computed leading to different but similar results for minimum temperature (dashed), maximum temperature (dash-dot) and precipitation (plain). Figure 1: Spatial location of the Argentine weather stations, providing daily temperature and precipitation records during all or part of the period