Exploratory Analysis of Forestry Data in NEFIS Natalia Andrienko & Gennady Andrienko FHG AIS (Fraunhofer Institute for Autonomous Intelligent Systems)

Slides:



Advertisements
Similar presentations
Statistical basics Marian Scott Dept of Statistics, University of Glasgow August 2008.
Advertisements

The Robert Gordon University School of Engineering Dr. Mohamed Amish
SADC Course in Statistics Session 4 & 5 Producing Good Tables.
Unit 8: Presenting Data in Charts, Graphs and Tables
The theory of data visualisation v2.0 Simon Andrews, Phil Ewels
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Excel Charts – Basic Skills Creating Charts in Excel.
Reading Graphs and Charts are more attractive and easy to understand than tables enable the reader to ‘see’ patterns in the data are easy to use for comparisons.
From requirements to design
Total Quality Management Tools
INTERPRET MARKETING INFORMATION TO TEST HYPOTHESES AND/OR TO RESOLVE ISSUES. INDICATOR 3.05.
The goal of data analysis is to gain information from the data. Exploratory data analysis: set of methods to display and summarize the data. Data on just.
1 i247: Information Visualization and Presentation Marti Hearst Multidimensional Graphing.
Geog 463: GIS Workshop May 17, 2006 Exploratory Spatial Data Analysis.
Types of Data Displays Based on the 2008 AZ State Mathematics Standard.
The Table Lens: Merging Graphical and Symbolic Representations in an Interactive Focus + Context Visualization for Tabular Information R. Rao and S. K.
Experimental Statistics I.  We use data to answer research questions  What evidence does data provide?  How do I make sense of these numbers without.
CHAPTER 1: Picturing Distributions with Graphs
Analytical Thinking.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Data Mining Techniques
Assessing and Evaluating Student Learning UNIVERSIDAD AUTÓMA DE QUERÉTARO FACULTAD DE LENGUAS Y LETRAS Profesional Asociado Universitario en Enseñanza.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Exploratory Data Analysis. Height and Weight 1.Data checking, identifying problems and characteristics Data exploration and Statistical analysis.
NSW Curriculum and Learning Innovation Centre Tinker with Tinker Plots Elaine Watkins, Senior Curriculum Officer, Numeracy.
Chapter 2 Summarizing and Graphing Data
Building a Visual Summary of Multiple Trajectories Natalia Andrienko & Gennady Andrienko
Term 2, 2011 Week 1. CONTENTS Types and purposes of graphic representations Spreadsheet software – Producing graphs from numerical data Mathematical functions.
Guided tours and on-line presentations: how authors make existing hypertext intelligible for readers C. C. Marshall, P. M. Irish, Guided tours and on-line.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Evaluation of software engineering. Software engineering research : Research in SE aims to achieve two main goals: 1) To increase the knowledge about.
Visualizing Information in Global Networks in Real Time Design, Implementation, Usability Study.
Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.
S TACKING -B ASED V ISUALIZATION OF T RAJECTORY A TTRIBUTE D ATA IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, DECEMBER 2012 Authors: 1.Christian.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
GEOG3025 Exploratory analysis of neighbourhood data.
Enabling Technology for Participatory Spatial Decision Making Hans Voss Gennady Andrienko Natalia Andrienko Spatial Decision Support Team
1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
1 (21) EZinfo Introduction. 2 (21) EZinfo  A Software that makes data analysis easy  Reveals patterns, trends, groups, outliers and complex relationships.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Scientific Research Methods in Geography Chapter 10 Feb 9, 2010.
Section 2.2 Bar Graphs, Circle Graphs, and Time-Series Graphs 2.2 / 1.
TipiX Rapid Visualization of Large Datasets Adrian V. Dalca, Ramesh Sridharan, Natalia Rost, Polina Golland 1.
Chapter 2 – Descriptive Statistics
Statistical Methods © 2004 Prentice-Hall, Inc. Week 2-1 Week 2 Presenting Data in Tables and Charts Statistical Methods.
Andrea Capiluppi Dipartimento di Automatica e Informatica Politecnico di Torino, Italy & Computing Dept. The Open University, UK AICA 2004, Benevento,
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
SCIENCE PROCESS SKILLS
Robust Estimators.
start with… PURPOSE OF DATA DISPLAYS. ALL OF THEM. The reason for these displays—rather then just putting numbers in your paragraphs—is to help your readers.
Microsoft Excel 2013 Chapter 8 Working with Trendlines, PivotTable Reports, PivotChart Reports, and Slicers.
Presenting Data in Charts, Graphs and Tables #1-8-1.
Multiplication of Common Fractions © Math As A Second Language All Rights Reserved next #6 Taking the Fear out of Math 1 3 ×1 3 Applying.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
11 Chapter 6 The Research Process – Data collection & Data analysis – (Stage 5 & 6 in Research Process) © 2009 John Wiley & Sons Ltd.
Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”
Graphs & Charts: The Art of Data Visualisation Alasdair Rutherford SSPC9C6University of StirlingSpring 2016.
Computing Honours Project (COMP10034) Lecture 4 Primary Research.
Investigate Plan Design Create Evaluate (Test it to objective evaluation at each stage of the design cycle) state – describe - explain the problem some.
DATA VISUALIZATION BOB MARSHALL, MD MPH MISM FAAFP FACULTY, DOD CLINICAL INFORMATICS FELLOWSHIP.
Applied Cartography and Introduction to GIS GEOG 2017 EL Lecture-5 Chapters 9 and 10.
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Exploring Data: Summary Statistics and Visualizations
Approaches to Spatial Analysis
CHAPTER 1: Picturing Distributions with Graphs
CHAPTER 1: Picturing Distributions with Graphs
Good Morning AP Stat! Day #2
CHAPTER 1: Picturing Distributions with Graphs
Presentation transcript:

Exploratory Analysis of Forestry Data in NEFIS Natalia Andrienko & Gennady Andrienko FHG AIS (Fraunhofer Institute for Autonomous Intelligent Systems) NEFIS Project Workshop, JRC Italy, 29 th June 2005

NEFIS and our research Our research focus is EDA – Exploratory Data Analysis (in particular, spatial and temporal data) In NEFIS, we strive at explaining and promoting the ideas and principles of EDA We have used the ICP Forests defoliation data as a non-trivial example to demonstrate systematic, comprehensive EDA We hope to receive valuable feedback from you for guiding our further work

What Is EDA? Emerged in statistics in 1970ies; originator: John Tukey A philosophy and discipline of unbiased looking at data: “What can data tell me?” rather than “Do they agree with my expectations?” –Similar to the work of a detective (J.Tukey) Need to look at data  focus on visualisation and user interaction with data displays

Purposes of EDA Uncover peculiarities of the data and, on this basis, understand how the data should be further processed (e.g. filtered, transformed, split into parts, fused, …) Generate hypotheses for further testing (e.g. using statistical methods) Choose proper methods for in-depth analysis (possibly, domain-specific) Especially important for previously unknown data, e.g. found in the Web  relevant to NEFIS

EDA vs. other analyses EDA does not substitute rigor methods of numerical analysis, either general or domain-specific, but should give the understanding what methods and how to apply Origina l data 1. EDA Understandin g of the data (mental model) 2. Data processing Processe d data 3. In-depth analysis Conclusions, theories, decisions, …

EDA vs. information presentation EDA makes intensive use of graphics However, “nice” presentation and reporting are not EDA purposes Primary goal of presentation: convey certain idea or set of ideas to others –Understandably –Convincingly –Aesthetically attractively This requires different visual means than exploration

The defoliation data Large volume: 6169 spatially-referenced time series Two dimensions: S&T Many missing values No full compatibility across countries, species, time etc.

EDA: data quality issues  Specialists’ opinion (after seeing the draft report of the data exploration): “The data were not meant for analysis!”  But: 1.There are no ideal data (especially in the Web and for free) 2.Even for understanding data inadequacy one needs first to explore them 3.Even imperfect data can be useful 4.The principles of EDA (demonstrated further) are applicable to perfect data as well

General procedure of the EDA 1.See the whole –Space + Time  2 complementary views 1)Evolution of spatial patterns in time 2)Distribution of temporal behaviours in space 2.Divide and focus –Data are complex  Have to be explored by slices and subsets (species, age groups, countries, years, …) 3.Attend to particulars –Detect outliers, strange behaviours, unexpected patterns, …

See the whole: Handle large data volumes General approach: Data aggregation Task 1: Explore evolution of spatial patterns Appropriate data transformation: aggregate by small space compartments (regular grid with 4025 cells); separately for different species; various aggregates (mean, max)  Gain: no symbol overlapping

Explore evolution of spatial patterns a)Animated map b)Map sequence Observations: Persistently high values in Poland Improvement in Belarus Mosaic distribution in most countries: great differences between close locations Outliers

Divide and Focus: Exploration on country level Recommendable due to inconsistencies between countries Observation: abrupt changes between locations  spatial smoothing methods are not appropriate

Explore spatial distribution of temporal behaviours Are behaviours in neighbouring places similar? Step 1. Smoothing supports revealing general patterns and disregarding fluctuations and outliers (we shall look at outliers later)

Explore spatial distribution of temporal behaviours Are behaviours in neighbouring places similar? Step 2. Temporal comparison (e.g. with particular year, mean for a period) helps to disregard absolute differences in values and thus focus on behaviours Observation: no strong similarity between neighbouring places

Compare behaviours in plots with different main species Mosaic signs: –6 rows for species; –14 columns for years ; –Colours encode defoliation values Observation: behaviours differ for different main species

Explore overall temporal trends Line overlapping obstructs data analysis  apply aggregation

Aggregation method 1: by quantiles

Aggregation method 2: by intervals

Divide and Focus: Germany

Divide and Focus: age groups 1,3

Attend to particulars Types of particulars (examples): –Extreme values –Extreme changes –High variability –… Questions: –When? –Where? –What is around? –Why? (a question for further, in-depth analysis) Domain knowledge is essential

Attend to particulars: extreme values 1.Click on a segment corresponding to extreme values 2.The behaviour(s) is(are) highlighted on the time graph 3.The location(s) is(are) highlighted on the map

Attend to particulars: what is around? In some neighbouring places the behaviours during the period are somewhat similar

Attend to particulars: extreme changes 1.Transform the time graph to show changes 2.Select extreme changes in a specific year (here 2003)

Attend to particulars: high variation 1.Aggregate time graph by quantiles 2.Save counts 3.Visualise e.g. on a scatter plot 4.Select items with high variation

Attend to particulars: high fluctuation Select items with maximal number of jumps between quantiles

Attend to particulars: stable extremes Select items being always in the topmost 10%

Attend to particulars: stable increase 1.Turn the time graph in the segmentation mode 2.Choose “increase” and set minimum difference 3.Select a sequence of years by clicking 4.Check sensitivity to the time period!

Conclusions: the Data This dataset is not suitable for application of major statistical analysis methods due to –absence of spatial & temporal smoothness –skewed distributions –outliers –missing values The data may be suitable for other purposes (e.g. in a context of a broader study of the ecological situation over Europe) –EDA methods can promote insights

Recap: Exploration procedure See the whole –Evolution of spatial patterns in time –Distribution of temporal behaviours in space Divide and focus –Data were explored by slices and subsets (species, age groups, countries, years, …) Attend to particulars –Extreme values, extreme changes, high variation, high fluctuations, stable growth …

Recap: Tools Visualisation on thematic maps, time graphs, other aspatial displays Aggregation: reduce data volume & symbol overlapping Filtering: divide and focus (select subsets) Marking: see corresponding data on different displays Data transformation: smoothing, computing changes, normalisation etc. It is important to use the tools in combination

Further information Software: Scientific issues (papers, tutorials, demos): Book to appear: N. and G. Andrienko “Exploratory Analysis of Spatial and Temporal data. A Systematic Approach” (Springer-Verlag,  end 2005) A systematic approach to defining tasks, tools, and principles of EDA

In press, to appear  end