Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Procedure for Automated Quality Control and Homogenization of historical daily temperature and precipitation data (APACH). Part 1: Quality Control of.

Similar presentations


Presentation on theme: "A Procedure for Automated Quality Control and Homogenization of historical daily temperature and precipitation data (APACH). Part 1: Quality Control of."— Presentation transcript:

1 A Procedure for Automated Quality Control and Homogenization of historical daily temperature and precipitation data (APACH). Part 1: Quality Control of Argentinean daily data J.-P. Boulanger (1), J. Aizpuru (2), L. Leggieri (2) and M. Marino (3) (1) IRD, (2) FCEN/UBA, (3) SMN INTRODUCTION The quality-control method is based on a set of decision-tree algorithms analyzing separately precipitation and minimum and maximum temperature. All our tests are non-parametric. We applied the method to the 1959-2005 historical daily database of the Argentine National Weather Service (Figs. 1 & 2). RESULTS The number of flagged temperature data followed the history of the National Weather Service fairly well. Indeed, before October 1967, when most of the data were filed on paper cards, lost or badly digitized due to humidity and dust and without quality control, our test detected a larger number of flagged data (especially Doubtful and Suspect; Fig. 4). During the 1967-1997 period, we observed a minimum of detection made by the NWS manual quality control (Fig. 4). The percentage of doubtful or suspect data is larger in regions of higher density of the network (Fig. 5). The extreme precipitation test aimed at detecting very high precipitation values. The flagged data were more numerous before 1976 than later (Fig. 6). This result suggests that, although it is difficult to detect them, some erroneous high precipitation values may exist in the database, requiring thorough manual quality control of our flagged data. This process will be done in the future by the National Weather Service. The drought sequence test largely relies on neighbouring stations. As in the temperature case, more drought sequences (flagged as NeedCheck, Doubtful or Suspect) were found at the beginning of the period. A minimum was observed in the early 1970s before the Military Coup (1976). Since then, a weak positive trend in the number of flagged dry days (especially NeedCheck) has been detected. It is possible that the decrease in the number of active stations since 1976 (Fig. 2) has reduced the number of neighbouring stations used in our tests and thus increased the number of NeedCheck flags. Our method will be applied in the framework of the CLARIS LPB project. PRECIPITATION The procedure to control the quality of the daily precipitation data is based on two classes of tests (see Boulanger et al., 2008 for a description of these complex tests): 1- Extreme daily precipitation 2- Extreme dry sequences Fig. 6: Interannual variability of the number of potential outliers in daily precipitation Fig. 7: (upper panel) Seasonal cycle of the number of dry days classified as NeedCheck (straight line), Doubftul (dashed) and Suspect (dash-dot) cases. Doubtful and Suspect cases have been multiplied by 100 to be displayed on the same plot as the NeedCheck values ; (middle panel) Interannual variability of the percentage of NeedCheck (straight line), Doubftul (dashed) and Suspect (dash-dot) cases ; (lower panel) Interannual variability of the total percentage of the NeedCheck, Doubtful and Suspect cases. Fig. 5: Percentage of NeedCheck, Doubtful and Suspect values for minimum (upper panel) and maximum (lower panel) temperature (percentage computed relative to the total number of observed values). The symbols: Filled circles represent a percentage larger than 15% (NeedCheck), 0.1% (Doubtful) and 0.05% (Suspect). Circles represent a percentage larger than 5% (NeedCheck), 0.04% (Doubtful) and 0.01% (Suspect). Crosses represent a percentage smaller than 5% (NeedCheck), 0.04% (Doubtful) and 0.01% (Suspect). Fig. 4: Percentage of minimum (plain) and maximum (dash-dot) temperature observations classified as NeedCheck (top panel), Doubtful (middle panel) and Suspect (bottom panel). Fig.3: Daily minimum and maximum temperature decision-tree. All codes are explained in the tables to the right. TEMPERATURE We rescaled the data into percentiles computing distances to the 25 th (for lower values) or 75 th (for upper values) percentile (P25 or P75) as follows: Per(X)=(X-P25)/(P50-P25) if X<=P25 (25 th percentile) Per(X)=(X-P75)/(P75-P50) if X>= P75 (75 th percentile) Per(X)=0 otherwise. Our procedure to control the quality of minimum and maximum daily temperature is based on three classes of checks: 1.Consistency checks (single station; verify that T min < T max ) 2.Range, step and DIP checks based on station distribution (single station): - Outlier: OUT (T t )=Per(T t ) - Step: STEP (T t )=Per(T t -T t-1 ) - DIP: DIP(T t )=-STEP(T t )*STEP(T t+1 ), if (T t -T t-1 )* (T t+1 -T t ) < 0; DIP(T t )=0. 3.Spatial check based on neighbouring stations (spatial): - We pre-select neighbouring stations, located at less than 500km from the analyzed station, with a correlation larger than 0.8 (the correlation is computed only between data of the same month: January, February, … ). - For each pre-selected neighbouring station, we computed a linear regression between the analyzed and neighbour daily data. - Then we compute an interpolated value from the neighbours as: with 4.We then compute the difference between the analyzed and the interpolated value. The difference is normalized by the standard deviation of the daily temperature of the month of the day to check. The larger the distance, the stronger the confidence in stating that the data is erroneous. 5.Finally, we also computed the angle around the analyzed station covered by all the neighbours. In conclusion, for a given distance, the larger the angle covered by the neighbouring stations, the stronger the assertion that the data is Useful or Suspect. Similarly, for a given angle, the larger the distance, the stronger the assertion that the data is Useful or Suspect. Figure 2: Evolution of the weather station network during the 1959-2005 period. Stations with less than 10% of missing data are computed leading to different but similar results for minimum temperature (dashed), maximum temperature (dash-dot) and precipitation (plain). Figure 1: Spatial location of the Argentine weather stations, providing daily temperature and precipitation records during all or part of the period 1959-2005.


Download ppt "A Procedure for Automated Quality Control and Homogenization of historical daily temperature and precipitation data (APACH). Part 1: Quality Control of."

Similar presentations


Ads by Google