5.1.2 Analysis of stressors-responses relations with decision trees Lidija Globevnik, Nataša Atanasova, Mateja Škerjanec, Maja Koprivšek (University of.

Slides:



Advertisements
Similar presentations
Reporting sheet no.4 Emissions of pollutants Peter Kristensen SoE meeting 12 June 2007, Copenhagen.
Advertisements

Reporting sheet no.4 Emissions of pollutants Peter Kristensen, EEA Joint Eionet NRC Freshwater and Drafting group State of the Environment and Trends meeting.
WP5 – Chapter 7. Harmonisation Harmonisation of geometry, data definitions, data models, naming ISSUES: MS deliveries are described in WP 4.1 in an enhanced.
Date/ event: Author: Overview of ETC Water data outputs 2010 Miroslav Fanta ETC Water data manager WISE TG Meeting Madrid Miroslav Fanta.
Workshop on Climatic Analysis and Mapping for Agriculture
Water Quality H. Behrendt, M. Grossmann, H. Gömann, U. Mischke, A. Schöll, J. Steidl GLOWA-Elbe GLOWA Status conference 19 May 2005 Cologne Linkages of.
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Estimation and Prediction of Fresh Water Runoff Based on Atmospheric Data: Preliminary results Cody D. S. Sipkema 4 th Year Environmental Engineering Co-op.
Brian Hemsley- Flint B.Sc. C.Biol. M.I.Biol. Northeast Region Ecology Team Leader.
Importance of Spatial Distribution in Small Watersheds Francisco Olivera, Ph.D., P.E. Assistant Professor Huidae Cho Graduate Research Assistant Zachry.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Application of seasonal climate forecasts to predict regional scale crop yields in South Africa Trevor Lumsden and Roland Schulze School of Bioresources.
4. CONCLUSIONS AND FURTHER WORK With the knowledge library developed within this research we are establishing a new integrated watershed modeling approach.
Modeling Possibilities
Slides for “Data Mining” by I. H. Witten and E. Frank.
Updating EU forest types process Marco Marchetti University of Molise-Italian Academy of Forest Science.
MARS Geodatabase (5.1.1): and Pressure data Ljubljana.
JRC-AL: WORKSHOP, DATE DNDC-EUROPE Adrian Leip, Joint Research Centre 1.DNDC-EUROPE: quick description of concept and status 2.Improvement of HSMU-layer.
Evaluating trends in irrigation water requirement per unit are in north region of China, : should stations being classified according to land.
GEOSTATISICAL ANALYSIS Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257.
IPPC Discharges Monitoring Workshop Water Framework Directive Overview (and its implications for Industry) Peter Webster Regional Chemist (EPA Cork)
Monitoring Programs... A challenge for all of Europe Rivers draining >200km2 in Ireland compared to the Danube basin.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Quality control of daily data on example of Central European series of air temperature, relative humidity and precipitation P. Štěpánek (1), P. Zahradníček.
Outline of the training. 6 October 2005, TNMC, Bangkok.
EEA 2006 Accounts Update Tables. PART 1 Table 1.1: Classifying land cover and land cover change for land accounting.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Assessment of CCI Glacier and CCI Land cover data for hydrological modeling of the Arctic ocean drainage basin David Gustafsson, Kristina Isberg, Jörgen.
JRC-AL – Bonn on Disaggregation of CAPRI results Renate Köble Adrian Leip.
5.1.1 “MARS Geodatabase, version 1“ Lidija Globevnik, Maja Koprivšek, Luka Snoj (University of Ljubljana, Faculty of Civil and Geodetic Engineering) 22.
Comparison of Environmental Quality Objectives, Threshold Values or Water Quality Targets set for the Demands of European Water Framework Directive Ulrich.
M. Houssiau | EIONET AQ | Ljubljana – 5 October 2015 Cross analysis between urban system and air quality Provisional results ETC/Urban Land Soil systems.
Stanley Liphadzi Sustainability of shared freshwater resources in the South Africa Dialogue on Water Governance 2015, Fortaleza, Brazil, November 2015.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Copernicus Observations Requirements Workshop, Reading Requirements from agriculture applications Nadine Gobron On behalf Andrea Toreti & MARS colleagues.
Add your Logo in the slide master menu GLOBAQUA Meeting, January 13th-14th, Freising IMPLICATIONS Module Reporting back Implications Module: WP8, WP9,
Session 4: Information on other WISE data requests than SoE Document 4: Plans for assessment of European waters for the next State of European waters report.
Austrian Approach for Identification of Water Bodies Workshop on Identification of Surface Water Bodies Brussels, 25/26 September 2003 Birgit Vogel Austrian.
NERI Roskilde Tuesday, May 18 th 2004 EEA activities and projects on spatial analysis and land accounting Jean-Louis Weber, EEA LANDSCAPE EUROPE Seminar.
NOPOLU System2 Large scale assessment of non-point nutrient sources.
Modelling with CORILIS Change in land cover patterns, landscape ecological potential & “temperatures” on N2000, river basins and UMZ Wire frame and examples.
SOCOPSE Final Conference Maastricht, June 2009 Prof.Dr.Damià Barceló, Paula Guerra, Dr. Ethel Eljarrat IDAEA-CSIC, Spain. WP5: Case Studies Ter and Llobregat.
Thematic assessments based on results from RBMPs Coastal and transitional ecological status & related presures Inland surface waters Hydromorphological.
Trends in floods in small catchments – instantaneous vs. daily peaks
The Netherlands: manure policy and request for a derogation to the livestock manure limit of 170 kg N/ha per year for dr. ir. Cindy.
26134 Business Statistics Week 5 Tutorial
VegDRI History, Current Status, and Related Activities
Carolin Vorstius PhD Showcase Day, 29/03/2017
David Gustafsson, Kristina Isberg, Jörgen Rosberg
Towards a Pan European ecosystem assessment
Analysis of influencing factors on Budyko parameter and the application of Budyko framework in future runoff change projection EGU Weiguang Wang.
Agenda item 8b WISE SoE reporting 2015 state of the play
Type of presentation/visualisation
Jan Horálek (CHMI) Peter de Smet, Frank de Leeuw (RIVM),
Monitoring, assessing and classifying the environment
Geographical Information Systems for Statistics Mar 2007
Reporting sheet no.4 Emissions of pollutants
WFD Article 8 Schemas Yvonne Gordon-Walker.
Progress SOE –drafting group
Stefan Jensen EU water directors meeting, Saarisälkaa, FI
WISE - State of the art --- WISE - in the context of SEIS
Mandate of the EEA To provide the Community and Member States with:
Work on Agriculture and Water Linkages EEA in cooperation with JRC
Regression and Categorical Predictors
CIS Working Group 2A ECOSTAT SCG Meeting in Brussels
3rd meeting, 8 March 2006 EEA Copenhagen
2018 Freshwater data call Stéphane Isoard
EU Water Framework Directive
Presentation transcript:

5.1.2 Analysis of stressors-responses relations with decision trees Lidija Globevnik, Nataša Atanasova, Mateja Škerjanec, Maja Koprivšek (University of Ljubljana, Faculty of Civil and Geodetic Engineering) WP5 Meeting: Ljubljana, June, 2015

Investigation of pressures/stressors correspondence with water quality data and geo- climatic factors Geodatabase will contain datasets regarding the Multiple stressors, Ecological status, Water quantity, Water quality, and Ecosystem services. We will use data-driven modelling approach (namely regression and classification trees) to investigate the relationship between pressures/stressors, geo-climatic factors and the state of the waterbodies.

Data-driven modelling approach and decision trees The goal of the data-driven methods is to learn the dependencies between the inputs and the outputs of the observed system from the measured data. Decision tree learning is a commonly used method in data mining. A tree can be learned by splitting the source dataset into subsets based on attribute value tests. Two types of decision trees: Classification trees: when the predicted (target) variable is a class Regression trees: when the target variable is numeric or continuous

Classification trees Classification trees are used to separate the dataset into classes. ATT 1ATT 2ATT 3… TARGET (CLASS) …Poor …Moderate …Good …Good …Poor …Poor …Moderate …Poor …Good …Moderate …Moderate …Good …………… DATA SET (EXAMPLES)Classification treeSet of IF THEN rules IF (ATT 1_value ≤ value1) THEN (class_value = class1) IF (ATT 1_value > value1 and ATT 2_value ≤ value 2) THEN (class _value = class2) IF (ATT 1_value > value1 and ATT 2_value > value 2) THEN (class _value = class3). class1 class2 class3

Regression trees Regression trees are needed when the target variable is numeric or continuous. They are used for the prediction of the target value. ATT 1ATT 2ATT 3…TARGET … … … … … … … … … … … …1.636 …………… DATA SET (EXAMPLES) Regression tree TARGET = 2*ATT *ATT 2 + 0*ATT 3 ATT 1 < 10 ATT 2 < TARGET = 0.2*ATT 1 + 3*ATT 2 + 4*ATT 3 = NO = YES = NO TARGET = 0.01*ATT 1 + 0*ATT 2 + 5*ATT 3 = YES leaves, where the target variable is predicted nodes Set of equations for the prediction of a target value (i.e. regression model) IF (ATT 1 < 10) THEN (target = 0.2 ∙ ATT ∙ ATT ∙ ATT 3) IF (ATT 1 > 10 and ATT 2 < 0.011) THEN (target = 0.01 ∙ ATT ∙ ATT ∙ ATT 3) IF (ATT 1 > 10 and ATT 2 > 0.011) THEN (target = 2 ∙ ATT ∙ ATT ∙ ATT 3).

Finding the most important relationship between state and pressure/drivers data MARS Geodatabase is prepared in a way that allows Pressure (multiple stressors) and State data (water quality) linkage to spatial objects.

Preparing datasets: Spatial objects: River segments with an unique identifier „tr“ Main drains in FEC and other river segments Linkage of river segments with FEC and its „Hinterland“ (all FECs in drainage area): „tr“ linked to „zhyd“ Linkage of SoE monitoring stations with main drains Water quality and quantity data: Data on nutrients Pressure data: From EUROSTAT, E-PRTR and UWWTD; to include also modelled data (Moneris, JRC - GreenModel?)

FEC and hinterland polygons data: Surface area Have (average altitude), Hmin, Hmax Average slope (%) Precipitation, Temperature ( ) Population number, density Hydroecoregion, Bioregion, Ecoregion (FEC) Corine land cover (1st order) River Straler order (for main drain) River name (main drain) WFD WB - ID WFD ecological status Monitoring station ID on main drain Available water quality data (state) FieldDescriptionUnit Prefix Fdata applies to FEC Prefix Hdata applies to FEC Prefix Sstate WaterbaseIWISE SoE quality station ID trID of ECRINS river segment on which SoE quality station is located ZHYD_FECID of ECRINS FEC on which SoE quality station is located hinterlandDoes SoE quality staion have hinterland or not (YES/NO) zhyd_hinterlandID of hinterland of SoE quality station Hinterland_Area_km2Hinterland area in km2 strahlerStrahler order of tr where SoE quality staion is located SoE_RivMName of river WaterBody_IDWFD Water body ID WFD_ecol_stEcological station of river segment from WFD SoE_RiverDischRiver discharge (data from SoE quality database) DEM_altituAltitude of SoE quality station from DEM H_CLC1Agricultural areasShare of hinterland area H_CLC2Artificial surfacesShare of hinterland area H_CLC3Forest and semi natural areasShare of hinterland area H_CLC4Water bodiesShare of hinterland area H_CLC5WetlandsShare of hinterland area F_DEM_AAverage (mean) altitude derived from DMV[m a.s.l] F_DEM_MiMinimal altitude derived from DMV[m a.s.l] F_DEM_MxMaximal altitude derived from DMV[m a.s.l] F_SLOP_AAverage slope derived from DMV[percent reise] F_PON_A_WPopulation count of the World Version 3 (Wv3)-year 2000[people] F_POD_A_WPopulation density of the World Version 3 (Wv3)-year 2000[people/km2] F_POD_A_JRCPopulation density disaggregated with Corine land cover 2000-year 2000[people/km2] F_AR_km2Functional elementary cachment (FEC) area[m2] F_PRE_5000Average yearly precipitation for periode [mm/year] F_PRE1_5000Average january precipitation for periode [mm/month] F_PRE7_5000Average july precipitation for periode [mm/month] F_TEM_5000Average yearly precipitation for periode [°C] F_TEM1_5000Average january precipitation for periode [°C] F_TEM7_5000Average july precipitation for periode [°C] F_ECOR_IDEco regions ID (AREA_ID) 1-25Share of FEC area F_BIOR_IDBiogeographical regions ID (ABBRE)Share of FEC area F_HER_IDHydro eco region ID (HERCODE) - European Hydro-EcoregionsShare of FEC area

Data from EUROSTAT (mainly on agriculture) Farms Livestock Crops Irrigation All data on NUTS 2 level, for year 2010 In case there were no 2010 reportings, we used averages (from 2005 to 2009) or last reported values. FieldDescriptionUnit H_beehivesBeehivesnumber H_cattleCattleheads H_dairy_cattleDairy cattleheads H_equidaeEquidaeheads H_farms_lsFarms with livestocknumber H_irr_areaTotal irrigable areaha H_irr_volumeIrrigation water volumem3 H_maizeMaize yield100_kg/ha H_oth_cattleOther cattleheads H_oth_pigsOther pigsheads H_pigsPigs totalheads H_potatoesPotato yields100_kg/ha H_poultryPoultry1000_heads H_rabbitsRabbitsheads H_sheepSheepheads H_sowsSowsheads H_uaaUtilized agricultural areaha H_vineyardsVineyards100_kg/ha H_wheatWheat yield100_kg/ha

Data from WISE: -UWWTD and -SOE water quality FieldDescriptionUnit H_BOD_dischargesum of BOD discharges in hinterland (UWWTD)[t/y] H_COD_dischargesum of COD discharges in hinterland (UWWTD)[t/y] H_P_dischargesum of P discharges in hinterland (UWWTD)[t/y] H_N_dischargesum of N discharges in hinterland (UWWTD)[t/y] H_uwws_countnumber of UWW systems in hinterland (UWWTD)[t/y] H_TN_releasetotal nitrogen release (E-PRTR)[t/y] H_TP_releasetotal phosphorus release (E-PRTR)[t/y] S_ammoniummg/l N S_total ammoniummg/l N S_bod5mg/l O2 S_bod7mg/l O2 S_chlorophyll_aµg/l S_codcrmg/l O2 S_codmnmg/l O2 S_DOCdissolved organic carbonmg/l C S_DOdissolved oxygenmg/l O2 S_ECelectrical conductivityµS/cm S_KNkjeldahl nitrogenmg/l N S_nitratemg/l N S_orthophosphatesmg/l P S_OSoxygen saturation% S_ph S_silicatemg/l Si S_Twater temperature°C S_TOCtotal organic carbon (toc)mg/l C S_TPtotal phosphorusmg/l P

Temperature (°C)Precipitation (mm/year) Case study: Drava river catchement (1)

Ecoregions (Illies)Hydoecoregion (Rebecca project) Case study: Drava river catchement (2)

Drava river: 107 monitoring stations (water quantity)

HR_RV_29111 hinterland.pngAT_RV_FW hinterland.png

Temperature Population Density (people/km 2 ) Maize yield (100 kg/ha, 2010) Irrigation water volume (m 3 /year, 2010) Pigs (heads, 2010)

Modelling exercise – Drava river catchment The previously mentioned data were used to generate different classification trees using WEKA software. We decided to use the ecological status of water bodies (according to WFD) as a target variable. Our target variable can fall into one of the following three classes: Good: 33 examples Moderate: 42 examples Poor: 13 examples

1st Classification Tree (no. of parameters: 32; cross- validation (CV): 64 %; training data: 85 %... 85% of all cases are classigfed by this rules correctly ) -the most important parameter is eco-region; the eco-region "Hungarian Lowlands" has poor ecological state, while the eco-region "Dinaric Western Balkans" has moderate ecological status. -In eco-region "Alps" the most significant is percentage of urban areas, followed by the percentage of water surface. -Interestingly, a greater number of beehives resulted in a better ecological state of the watercourse. Test results – Drava river catchment (1) Number of beehives CLC1: agriculture CLC2 : artificial CLC3 : forest CLC4 : water bodies

Test results – Drava river catchment (2) 2nd Classification Tree (no. of parameters 12, CV: 64 %, training set: 84 %): -Here the most important parameter is altitude. If it is lower than m asl, then the ecological status is poor. -At higher altitudes, the most important parameter is percentage of forest land. If the forest area covers more than %, then the status of the water is good and percentage of urban areas becomes important. If it is less than 1.22 %, the status is good. -Interestingly (and logically) the ecological state of the higher-lying sections (Strahler <= 5) is better (good vs. moderate). Treshold 17 cases prove the rule, 3 failed CLC1: agriculture CLC2 : artificial CLC3 : forest CLC4 : water bodies IF THE HINTERLAND OF STATION IS COVERED with MORE THAN 90% OF FOREST IS A LARGE PROBABILTIY THAT THIS WB WILL HAVE GOOD ES. If not it depend on other land uses: if we have less then 90% of forest and urban areas less then 1.2, we can also expect good ES; otherwise we check again the forrest coverage. If it is less then 90 but more then 70, then we check the River discharge. If Forest is less then 70 (that is: urban area more then 1.2 and forest less then 70) we check the agricultural coverage. If more then 30, then moderate. Otherwise we go to the last check.

Test results – Drava river catchment (3) 3rd Classification Tree (no. of parameters: 13, CV: 65 % (more robust tree…preform better on validation dataset; training data: 73 %) In this case we used techniques to obtain smaller trees. Some information may be lost but the tree is more robust against new (validation) data. The tree is similar to the tree no. 2, only slightly shorter and easier to interpret. The important attributes are altitude, percentage of forest areas, and the percentage of water surface. IF THE HINTERLAND OF STATION IS COVERED with MORE THAN 90% OF FOREST IS A LARGE PROBABILTIY THAT THIS WB WILL HAVE GOOD ES. If not it depend on other land uses: if we have less then 90% of forest and water surface more 0.18% we can expect moderate state. Otherwise if we have less than 73% OF FOREST we can hardly expet moderate status. MODEL INDICATES THE IMPROTANCE OF THE FOREST AND WATER SURFACE: IF WE HAVE ENOUGH FOREST SURFACE WE CAN AFFORD OTHER ACTIVITIES. CLC1: agriculture CLC2 : artificial CLC3 : forest CLC4 : water bodies

Conclusions (1) 1)THE MOST IMPORTANT DRIVER/PRESSURE IS LAND USE FOREST COVER IS THE DOMINANT LAND USE CLASS AND THE TRESHOLD OF FOREST COVERAGE FOR GOOD STATUS IS 89% IN THE HINTERLAND WATER SURFACE IN THE HITERLAND IS THE SECOND MOST LAND USE CLAS: THE TRESHOLD IS 0,179% (IF MORE THAN 0,179% WATER INTHE HINTERLAND, THAN ONE CAN EXPECT MODERATE STATUS) (THESE ARE DRIVERS THAT REDUCE PRESSURES FROM OTHER DRIVERS) 2) FOR HIGH ALTITUDE: IF WE HAVE ENOUGH SURFACES OF FOREST (MORE THAN 90%) WE CAN EXPECT GOOD ECOLOGICAL STATUS; IF NOT WE HAVE TO SEE WHAT ELSE WE ARE DOING IN THE H: IF FOREST AREA BETWEEN 70-90%, THAN Ecdological Status DEPENDS ON RIVER DISCHARGE (MORE IS BETTER) IF AGRIC MORE THAN 30% THAN WE CAN EXPECT MODERATE STATUS; IF AGRIC LESS THAN 30% THAN 3) ALSO SEEMS IMPORTANT: IRRIGATION AND URBAN AREAS

How to interpret and use the trees Clear message from all models is that forest coverage is most important attribute for the ecological status of water bodies in Drava catchment The models provide with threshold values of the attributes, based on which a strategy for land use management in the hinterland can be developed. For example, clear guideline for managers is: In Drava catchment hinterlands below 161 m.a.s.l. tend to be problematic regardless of the land use and need more attention. Hinterlands above 161: If we keep more then 90 % of the land use as forest there is big probability to have good ES. If less forest: then pay attention to percentages of water surface, agricultural areas, urban areas and river discharge. These thresholds are given in the models from the previous slides Important to note: Classification trees were trained on Drava catchment, thus the info they disclose is valid for this catchment only

Modelling exercise – further work Each SoE station is affected by the corresponding drainage area. Therefore, it is more reasonable to use data aggregation on hinterland-level instead of on FEC-level, especially for the geo-climatic factors (e.g., average slope, average annual prectipitation, etc.). Not only ecological status, we can model other variables from the SoE stations as well (e.g. P and N ranges) We will model other catchments and find similarities. The size of the catchments to be discussed We still need to include point sources data (from E-PRTR and UWWTD databases), which will hopefully improve the interpretability of the models

Points for further discussion Which additional attributes should we include in our modeling tasks? Which target variables should we predict? Which type of decision trees seems more usefull – classification or regression trees? Should we perform modelling tasks for single test cases, river groups with common properties or for the Europe as a whole? Where do you see a potential use of the constructed decision trees within the other MARS Tasks?