Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.

Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Drowning in data yet starving for knowledge [Naisbitt -Rogers] Lecture 0 : Introduction Pat Browne

Introduction There are vast amounts of spatially related data available from government departments (e.g. CSO, agriculture, environment), local authorities, health boards, and private industry. This module studies techniques to analyze these large and diverse data sets with a view to gleaning new and useful information. We use two closely related approaches. Munge = to imperfectly transform information.

Introduction Firstly, we study how basic statistical techniques such a correlation and regression can be adapted to handle spatial data. Secondly, we study how basic knowledge discovery techniques, such as association rules, can be used in location based analysis.

AIMS The aim is to equip the student with the necessary skills to the extract decision support information from large datasets using statistical and knowledge discovery techniques. We will study the techniques and software that are necessary to analyze large spatial data sets.

OUTCOMES On successful completion of the module the students will be able to: 1.use basic descriptive statistics to describe spatial data. 2.use inferential statistics and probability to help make inferences, judgments, and decisions. 3.use statistical packages to analyze spatial data 4.use data mining software to assist in knowledge discovery. 5.use data mining and statistical software for decision support.

MODULE CONTENT Basic statistics and probability e.g. mean, variance, standard deviation, sampling, correlation and regression Spatial autocorrelation and spatial regression. Association rules, and other techniques for data mining and spatial data mining. A variety of spatial statistical techniques. For example, spatial point patterns, spatial interpolation, analysis of grids and surfaces. The use of statistical packages for spatial analysis. The use of data mining software in a spatial context.

Early Spatial Analysis http://en.wikipedia.org/wiki/John_Snow_%28physician%29 http://en.wikipedia.org/wiki/Spatial_analysis

Knowledge Discovery (or Data mining) What is data mining?: The non trivial extraction of implicit, previously unknown, and potentially useful information from data. Data mining finds valuable information hidden in large volumes of data. Data mining is the analysis of data and the use of software techniques for finding patterns and regularities in sets of data. The computer is responsible for finding the patterns by identifying the underlying rules and features in the data. It is possible to "strike gold" in unexpected places as the data mining software extracts patterns not previously discernible or so obvious that no-one has noticed them before.

Knowledge Discovery (or Data mining) Data mining lies at the intersection of database management, statistics, machine learning and artificial intelligence. DM provides semi-automatic techniques for discovering unexpected patterns in very large data sets.

Descriptive and Predictive DM

Descriptive Data Mining Descriptive analysis is an analysis that results in some description or summarization of data. It characterizes the properties of the data by discovering patterns in the data, which would be difficult for the human analyst to identify by eye or by using standards statistical techniques. Description involves identifying rules or models that describe data (e.g. 15% of those who buy ice cream also buy wafers).

Descriptive Data Mining Clustering (unsupervised learning) is a descriptive data mining technique. Clustering is the task of assigning cases into groups of cases (clusters) so that the cases within a group are similar to each other and are as different as possible from the cases in other groups. Clustering can identify groups of customers with similar buying patterns and this knowledge can be used to help promote certain products. Clustering can help locate what are the crime ‘hot spots’ in a city.

Descriptive Data Mining Association Rules. Association rule discovery (ARD) identifies the logical relationships within data. The rule can be expressed as a predicate in the form (IF x THEN y ). ARD can identify product lines that are bought together in a single shopping trip by many customers and this knowledge can be used to by a supermarket chain to help decide on the layout of the product lines.

Association Rule Example

Predictive Data Mining Predictive DM results in some description or summarization of a sample of data which predicts the form of unobserved data. Prediction involves building a set of rules or a model that will enable unknown or future values of a variable to be predicted from known values of another variable.

Predictive Data Mining Classification is a predictive data mining technique. Classification is the task of finding a model that maps (classifies) each case into one of several predefined classes. Classification is used in risk assessment in the insurance industry.

Predictive Data Mining Regression analysis is a predictive data mining technique that uses a model to predict a value. Regression can be used to predict sales of new product lines based on advertising expenditure.

Linear regression : Example Below is a linear regression model. It shows the value of the amount customers spend in a supermarket fitted as a linear function of peoples income. Where a (the intercept) and b (the slope) are found by the data mining algorithm. If the model is reasonably accurate, values of AnnualSpending(Y) can be predicted (or calculated) from values of Income(X)

Statistical Techniques Data mining uses statistical concepts and techniques e.g. mean, standards deviation, population distribution, probability, sampling. There are differences between DM and statistics: DM is a process requiring many steps such as data cleaning. Data mining can be used as a prelude to a more formal statistical study (hypothesis discovery).

There are special spatial statistical techniques e.g. interpolation what is the likely value of a point. Also many standard statistical techniques can be adapted for spatial applications (e.g. using Moran’s I). These usually involve including a weight matrix representing location in the basic formula. Statistical techniques for spatial data.

Spatial autocorrelation Negative Dispersed Spatial Independence Spatial Clustering Positive BB = Blue beside Blue BW = Blue beside White WW = White beside White. 32 white cell and 32 blue cells = 64 cells

Moran’s I – Same Mean & SD, but different spatial configurations.

References Lloyd: Spatial Data Analysis Applied Spatial Data Analysis with R Bivand, Pebesma, Gómez-Rubio http://www.spatial.cs.umn.edu/Book/ http://www.manning.com/obe/

Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.

Similar presentations

Presentation on theme: "Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.

Similar presentations

Presentation on theme: "Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related."— Presentation transcript:

Similar presentations

About project

Feedback