Presentation on theme: "TRAINING SESSION ON HOMOGENISATION METHOD Bologna, 17th-18th May 2005 Maurizio Maugeri, University of Milan Our approach to homogenisation."— Presentation transcript:
TRAINING SESSION ON HOMOGENISATION METHOD Bologna, 17th-18th May 2005 Maurizio Maugeri, University of Milan Our approach to homogenisation
Within this context, in the year 2000, a research program with the aim of better investigating the impact of data quality and homogeneity issues on the detection of Italian temperature and precipitation trends in the last two centuries was set up Final goal: revise and update the results presented in Maugeri and Nanni (1998), Buffoni et al. (1999) and Brunetti et al. (2000). The program has been developed both within EU project ALP-IMP and within National project CLIMAGRI – Climate and Agriculture, Ministry for Agriculture and Forests.
Principal steps of the program Data and metadata recovery Homogeneity testing and record adjusting Data Analysis Understanding local versus larger scale
Why spend more time on data and metadata recovery? Invention of some of the principal meteorological instruments Introduction of the first synoptic network Six series beginning in the 18th century: Bologna, Milano, Roma, Padova, Palermo and Torino Italy is well placed in the field of long term records So, over the last 3 centuries a huge amount of data and metadata has been collected in Italian data archives
The importance of these data has been known for a long time……. Cantù V. and Narducci P. (1967) Lunghe serie di osservazioni meteorologiche. Rivista di Meteorologia Aeronautica, Anno XXVII, n. 2, 71-79. Eredia F. (1908) Le precipitazioni atmosferiche in Italia dal 1880 al 1905. In: Annali dell'Ufficio Centrale di Meteorologia. Serie II, Vol. XXVII, anno 1905, Rome. Eredia F. (1919) Osservazioni pluviometriche raccolate a tutto l'anno 1915 dal R. Ufficio Centrale di Meteorologia e Geodinamica. Ministero dei Lavori Pubblici, Rome. Eredia F. (1925) Osservazioni pluviometriche raccolate nel quinquennio 1916-1920 dal R. Ufficio Centrale di Meteorologia e Geodinamica. Ministero dei Lavori Pubblici, Rome. Mennella C. 1967. Il Clima d'Italia. Napoli: Fratelli Conti Editori, 724 pp. Millosevich (1882) Sulla distribuzione della pioggia in Italia. In: Annali dell'Ufficio Centrale di Meteorologia. Serie II, Vol. III, anno 1881, Rome. Millosevich (1885) Appendice alla memoria sulla pioggia in Italia. In: Annali dell'Ufficio Centrale di Meteorologia. Serie II, Vol. V, anno 1883, Rome. Narducci, P., 1991: Bibliografia Climatologica Italiana, Consiglio Nazionale dei Geometri, Roma.
… but until a few years ago only a small amount of the data was available in digital format… Adapted from: Anzaldi C., Mirri L. and Trevisan V., 1980: Archivio Storico delle osservazioni meteorologiche, Pubblicazione CNR AQ/5/27, Roma. …and no attempts were made to collect the metadata
Data and metadata collection: other variables … the activities are still in progress (EU project ALP-IMP)… …They concern air pressure, cloud cover, humidity and snow… SNOW (HS: snow at ground; HN: fresh snow) daily / monthly data About 15 records of northern Italy HUMIDITY (i.e. dry / wet temperatures). Daily data - 2 records 1951-2004 PERIOD: All variables available in digital format Italian Air Force data-set. AIR PRESSURE (secular records) CLOUD COVER (secular records) The role of national and international projects CLIMAGRI (MiPAF), ALP-IMP (EU), COFIN e FIRB (MIUR) ___________________________________
Data and metadata collection: metadata For full details; see CLIMAGRI project WEB site (www.climagri.it) Metadata collection was performed with two main objectives: i)to understand the evolution of the Italian meteorological network ii)to reconstruct the history of all the stations of the data-set. The research on the history of the single stations was performed both by analysing a large amount of grey literature and by means of the UCEA archive. All information was summarized in a card for each data series. Each card is divided into three parts. In the first part all the information obtained from the literature is reported. In the second part there are abstracts from the epistolary correspondence between the stations and the Central Office. In the third part the sources of the data used to construct the record are summarized.
Metadata: for every station Abstracts of all published papers (grey literature) Abstracts of the correspondence between the observatories and the Central Office Position Data sources Data availability Other notes For more details; see CLIMAGRI project WEB site
1) Make a synthesis of the metadata and study the impact of possible changes 2) Perform an initial homogenisation by means of direct methodologies 3) Perform a final homogenisation by means of indirect methodologies Homogenisation: principal steps We developed a method consisting in:
Metadata: for every station Maugeri, M., Buffoni, L., Chlistovsky, F., 2002: Daily Milan temperature and pressure series (1763- 1998): history of the observations and data and metadata recovery, Climatic Change, 53, 101-117.
Corrections applied to Milan daily air pressure data to eliminate the bias introduced by calculating daily means using observations taken at A: 8 a.m., 2 p.m. and 7. p.m. and B: sunrise and mid-afternoon. The corrections A apply to the period December 1 st, 1932 - December 31 st, 1987, corrections B to 1763-1834. Corrections by means of metadata: an example Maugeri, M., Buffoni, L., Delmonte, B., Fassina, A., 2002: Daily Milan temperature and pressure series (1763- 1998): completing and homogenising the data, Climatic Change, 53, 119-149.
The indirect methods make use of meteorological data from neighbouring stations. Formally, data of a given series can be represented as a sum of more terms. Be X(t) the meteorological variables value X at the time t. Therefore it can be written: X(t) = N + A(t) + IH(t)(t = 1, 2,..., n)(1) where N is Xs normal value (it is defined by considering the mean value over a suitable time interval like, for example, the period 1961-1990), A(t) is the anomaly related to the instant t (it defines the departure of the variable X from its normal value) and IH(t) is the possible inhomogeneity lying in the measured value X(t) (in the simplest case, IH(t) is a step function that equals to 0 until the inhomogeneity-inducing event takes place, and then that equals to a constant value which represents the effect of the inhomogeneity in fact). By using an analogous notation, a reference series which is constituted, for example, by the data of a neighbouring station can be written as follows: Homogenisation by means of indirect methods
R(t) = N + A(t) + IH(t)(t = 1, 2,..., n)(2) If the two series belong to the same climatic area, it can be assumed that A(t) = A(t) for each value of t. Moreover, if you postulate the reference series as homogeneous, it will be always true that IH(t) = 0. Therefore, the series of the differences will be: Z(t) = X(t) - R(t) = (N - N) + IH(t)(t = 1, 2,..., n)(3) In other terms it can be assumed that, unless there are possible inhomogeneities, the series of the differences must result as constant. The same approach is followed for the series of the ratios. The latter approach is particularly used for precipitation series. Possible deviations from Z(t) constant path are therefore assumed as being due to inhomogeneities. Homogenisation by means of indirect methods
The application of indirect methodologies is actually much more complicated than what the previous relations seem to suggest. In fact, whenever in a relation like the (3) there is a signal which is characterised by one or more steps, it is usually very hard to understand whether it is due to the station under exam or to the station used as a reference. Then, if you consider not too short periods, it can also happen that both the stations present some significant inhomogeneities and that there are several step-shaped signals. So, the question of the identification of a reference series is actually very problematic… Homogenisation by means of indirect methods
How do we select the reference series? A procedure that rejects the a priori existence of homogeneous reference series is used. Each series is tested against each other series in subgroups of 10 series. Subsequently, the break signals of one series against all others are collected in a decision matrix and the breaks are assigned to the single series according to metadata and/or to probability.
The comparison between a test series and a reference series can be performed by a number of different mathematical techniques. We use of them: the Craddock homogeneity test How do we compare the test and the reference series?
One among the most commonly used statistical tests is the Craddock test. At first it was developed for analysing the precipitation series and subsequently it has been widely updated, improved and extended to thermometric records. It accumulates the normalized differences between two series (a and b) according to one of the following formulas: where the mean values of the series are calculated over the entire period in which the comparison is performed and where the choice of the proper formula depends on the underlying hypothesis, such as on considering as a constant the difference either the ratio between stations of the same area. Homogenisation: the Craddock statistical test
See also the example presented on the craddock.xls Excel File.
In order to display the ability of the Craddock homogeneity test to identify some typical inhomogeneities, we have made use of records generated by means of random numbers. In particular, we have generated some records with the features of Milan yearly mean temperature and yearly total precipitation. TEMPERATURE Series length: 240 data Average: 13.3 °C St. Dev.: 0.9 °C PRECIPITATION Series length: 240 data Average: 1015 mm St. Dev.: 202 mm Then we have applied the Craddock test to A) some pairs of completely random temperature/precipitation records and B) some pairs of records obtained partially from random series and in part from the series to test itself (i.e. we introduce a 0.7 correlation between the pair of series to subject to the Craddock homogeneity test). Then we have added to the series some typical errors as step functions, trends, … All results are displayed in the Excel files Craddock_TMED_1+2 and Craddock_PREC_1+2.
Homogenisation: statistical test and metadata Craddock test - Bologna precipitation record Allinizio del 1857 a questo pluviometro, ridotto in cattivo stato pel lungo uso, ne venne sostituito un altro di migliore costruzione, e lavorato con molta precisione... Introduction of a new pluviometer (Fuess recorder):... fu collocato a cura del prof Bernardo Dessau nel periodo 1900-1903... Change in data origin: from Osservatorio Astronomico to Istituto Idrografico News about a damage to the pluviometer. In corrispondence with repairing the damage, the cause of the underestimation of precipitation has been removed for the period 1900-1928
Basic problem: what has to be corrected? a) All the periods found by statistical methods b) Only the periods for which there is evidence in metadata The problem is, in part, still open Our methodology: Wide use of statistical methods (especially for air temperature) Critical analysis in the light of metadata The CLIMAGRI project