Download presentation

Presentation is loading. Please wait.

Published byRenee Vipond Modified about 1 year ago

1
Data Mining in Spatial Data Sets Hemant Kumar Jerath,B.Tech. MS Project Student Mangalore University Advisors: Dr. B.K Mohan & Dr.(Mrs.).P. Venkatachalam CSRE, IIT Bombay

2
Data Management System Data Mining-Concepts, Algorithms & Tasks Data Warehouse OLAP(On-line Analytical Processing) Knowledge Discovery Process Spatial Data Warehouse & OLAP Spatial Data Mining – Concept & Definition Case Studies - Data Mining Software Spatial Data Mining- Software Architecture Contents

3
Data Base Management System Data warehouse OLAP SQL QUERY INTERFACE OUTPUT/Knowledge Explicit/Trivial Knowledge

4
Data Mining techniques has an answer to explore the implicit knowledge. DBMS Vs. Data Mining? DBMS: sql driven exploration Data Mining: automatic exploration

5
Data Mining Definition: Data Mining is analysis of (often large) observational data sets to find implicit relationships and to summarize the data in a novel ways that are both understandable and useful to the data owner.[Hand, et al]

6
Keywords in Definition the large data sets observational data: opposed to the experimental data relationship and summaries- referred as model and patterns –e.g. linear equations, rules, clusters, graphs, tree structures and recurrent patterns in the time series.

7
Data Tombs Golden Nuggets DATA MINING Implicit Knowledge Transform your data to critical knowledge

8
Data Mining Information Theory Machine Learning Artificial Intelligence Data Mining – A CONFLUENCE of multi disciplines Statistics

9
Knowledge Discovery Process(KDD) Phase of real discovery

10
Data Preprocessing Data Cleaning –Missing values –Noisy data Binning Clustering Combined computer and human interaction Regression –Inconsistent data Data Integration and Transformation –Data Integration –Data Transformation

11
Data Transformation –Smoothening –Aggregation –Generalization –Normalization –Attribute Construction Data Reduction –Data Cube aggregation –Dimension reduction –Data Compression –Numerosity reduction –Discretization and concept hierarchy generation …Continued

12
Data Warehouse Definition: A data warehouse is a subject oriented Integrated (heterogeneous sources) time variant and non-volatile collection of data in support of management decision making process [W.H.Inmon]

13
STAR SCHEMA

14
SNOWFLAKE SCHEMA

15
[address, time, item] cell Data Cube Technology

16
OLAP Operations Roll Up(Drill-up): summarize data climbs up hierarchy or by dimension reduction Drill Down(roll down): reserve of roll-up from higher level summary to lower summary or detailed data or introducing new dimensions Slice and dice : project and select Pivot(rotate): reorient the cube, visualization, 3D to series of 2D planes Other operations drill across : involving(across) more than one fact table drill through : through the bottom level of the cube to its back-end relational tables(using SQL)

17
Drill Down Operation Roll Up Operation

18
Mining technology today Data warehouse Preprocessing utilities Mining operations Visualization Tools Vendors (IDC 1999) –SAS: 29% –SPSS: 13.5% –IBM: 6% Extract data via ODBC Sampling Attribute transformation Scalable algorithms association classification clustering sequence mining

19
Data Mining Algorithms Definition: A data mining algorithm is a well-defined procedure that takes data as input and produces output in the form of models or patterns.

20
Data Mining Algorithms Reductionist approach: A data mining algorithm can be thought of as a 'tuple' consisting of: {model structure, score function, search method, data management techniques}

21
CARTB.PA Priori Task Classification & regression RegressionRule Pattern Discovery Structure Decision TreeNeural Network Association Rules Score function Cross Validated Loss function Squared ErrorSupport/ Accuracy Search Method Greedy search over structures Gradient Descent on parameters Breadth First with Pruning DMT* unspecified Linear Scans * Data Management Technique

22
So eventually, we can generate potentially infinite number of algorithms by combining different; model structure score function search methods and data management techniques

23
Data Mining Task- Taxonomy Prediction: use of some variables to predict own known or future values of variables –Classification, regression and deviation detection Description: Find human interpretable patterns that describe the data –Clustering, association rule discovery, sequential rule discovery

24
Verification Model: affirm or negate the hypothesis( an iterative process, progressing refinement of hypothesis) Discovery Driven Model: system automatically finds the information Data Mining Task- Taxonomy

25
Mining operations Classification Regression Classification trees Neural networks Bayesian learning Nearest neighbor Radial basis functions Support vector machines Meta learning methods –Bagging,boosting Clustering hierarchical EM density based Sequence mining Time series similarity Temporal patterns Item set mining Association rules Causality Sequential classification Graphical models –Hidden Markov Models

26
Mining Tasks Discovery of Association rule X=>Y(s%,c%) S- support C- confidence

27
......Continued Clustering Criteria: i. Available similarity ii. Set function (optimizing technique) Land-use: Finding the similar areas under the land use in a earth observation database City-Planning: Identifying a group of houses according to their house type, value and geographic location

28

29
Classification –Finding rules to partition data into disjoint groups......Continued

30
Classification Given old data about customers and payments, predict new applicant’s loan eligibility. Age Salary Profession Location Customer type Previous customersClassifierDecision rules Salary > 5 L Prof. = Exec New applicant’s data Good/ bad

31
Classification Vs Clustering Clustering: methods generate the class labels. [descriptive] Classification: allocation of class labels to the data thru classifier.[predictive]

32
Frequent Episodes Sequence of events occur frequently these mainly used for the temporal data.

33
Deviation detection Identification of outliers

34
Sequence Mining Sequence of occurrence of the associative rules.

35
Spatial Data Mining

36
Definition: Spatial data mining is an extraction of implicit knowledge, spatial relationships, or other interesting patterns not explicitly stored in the databases.

37
What is the difference between Data Mining and spatial data mining? Data Mining: –non-spatial attribute Spatial Data Mining: –Integration of both spatial and non-spatial dimension in various KDD algorithms Spatial attribute (use of thematic maps) Non-spatial attribute (relational database)

38
Spatial Data Models Raster Model: pixel data sets Vector Model: point, line, polygon objects

39
Fundamental Operations used to vector data sets Spatial Relations with neighbors is an imp. Aspect of Spatial Data Mining –distance between the points –area of the object (a polygon) –length of the chain or polygon –intersection or the union of the objects –mutual position of objects( they can intersect, overlap or touch)

40
SOLAP ARC SDE DATA MINING SPATIAL AND NON-SPATIAL DATAWAREHOUSE Attribute data Shape files

41
Spatial Warehouse and OLAP Definition: The Spatial Data Warehouse is a subject oriented, integrated, time variant and non-volatile collection of both spatial and non-spatial data in support of managements decision making process.

42
SOLAP and SDW-Issues Spatial Data format –Structure specific –Vendor specific OLAP processing –Spatial indexing –Accessing methods

43
Spatial data Cube Model –Use of spatial dimensions in the cube. Star/Snowflake Model Construction of Spatial Warehouse and OLAP

44
Star Model of a spatial data warehouse: BC_weather

45
Agriculture Cash CropGrains Fruits vegetationRicewheat mango kiwiKale tomatojasmine basmati Concept Hierarchies

46
G_close_to Not_disjointClose_to IntersectsInsideContainsEqual Adjacent_to intersects coverscontains The hierarchy of topological relations

47
Modeling dimension-Spatial Data Cube Non-spatial Dimension –temp., precipitation with generalization hot, wet Spatial to Non-Spatial – pacific_northwest, big_state Spatial to Spatial dimension

48
What we can measure in spatial data cube? Numerical measure – e.g monthly revenue of the region, and roll up may get total revenue of the region Spatial Measure – collection of pointers to the spatial objects – generalization (roll-up), regions of the same temperature and precipitation are grouped together.

49
Spatial Data Mining: A Database Approach Martin Ester, Hans-Peter Kriegel, Jorg Sander Step I: Discover centers based on some non- spatial attribute[ clustering-descriptive mining ] Step II: determine the (theoretical) trend of some non-spatial attribute. Step III: discover the deviation of the theoretical trends Step IV: explain the deviation by the spatial object, e.g. may be presence of some infrastructure.

50

51

52
Associations looks like this!! Spatial Association rules Is_a(X,school) ^ close_to(X,sports centre)=>close_to(X,parks) [.5%,80%] Is_a(X,city)^within(X,bc)^adjacent_to(X,water)=>close_to(X,border). (.5,92%) Predicates like: Close_to, far_away Intersect, overlap and disjoint Left_of, west_of

53
……Continued Introduction to: – neighborhood graphs – neighborhood paths The predicate neighbor may be one of the neighborhood relations:

54
Top-Down Deepening Approach Large Patterns Strong Implication At coarse details Search to low level details Progressive Deepening Search Continues till no large patterns are not found.

55
Top-Down Deepening Approach Optimization technique is that the search for large patterns at high concept level –R-tree or plane sweep techniques operating on MBR(minimum bounding rectangles)

56
Generalization-based Spatial Data Mining nonspatial-dominant generalizations –(-9C,-10-0C) COLD (attribute induction) Spatial-dominant generalization –Quad-tree and R-trees (attribute induction)

57
Spatial Clustering Clustering algorithms can be applied to discover centers of high economic power. –DBSCAN –PAM, CLARA, CLARANS(spatial data dominant clustering and non-spatial data dominant clustering) –CLARANS(-neighbor graphs) –DBLEARN on non-spatial

58
Spatial Classification Non-spatial attribute e.g. no. of salespersons in a store Spatially related attribute with non-spatial values, like population living within 1km from store Spatial predicates, like –Distance_less_than_10km(X,a) Spatial function, like driving_distance(X,beach)

59
Decision Tree Description of classified objects Description of census block group Buffers are defined For Trade Area

60
High_profit=N High_profit=Y

61
Classification is developed using ID3 algorithm

62
Spatial Trend Detection Trend- a temporal pattern – network alarms – recurrent illness algorithm computes the local changes of the specified attribute when moving to the neighbors as well as distance to the neighbors –Use of linear regression for the trend generation

63
Location Predictions –Logistic Spatial Autoregressive Model(SAR) y=dWy+Xb+e Contiguity matrix Spatial Predictions

64
Spatial Outlier Detection Techniques –(use of neighborhood graphs, paths and indices).....Continued

65
GeoMiner Architecture

66
SPIN ARCHITECTURE

67
Weka 3: Machine Learning Software in Java machine learning algorithms for data mining problems Weka contains tools for data pre-processing classification regression clustering association rules and visualization

68
SDAM Architecture USE OF MLC++ Library for Implementing DM Techniques

69
MLC++ extends supervised machine learning classification accuracy estimation cross-validation bootstrap decision trees ID3 decision graphs naive-bayes decision tables majority induction algorithms classifiers categorizers general logic diagrams instance-based algorithms discretization lazy learning

70

71
Issues In Building Spatial Data Mining Environment Size of the database Static or dynamic database Testing present spatial data structure for finding the implicit relationship between the spatial objects for mining tasks. Building Spatial Data warehouse and Spatial OLAP Which Data Mining Task? Choosing the mining algorithms for specific task….e.g. 10 years span between the concept of associative data mining….various algorithms has been developed and introduced. Which platform for implementation of mining algorithms, MLC++ on VC++ or Weka on Java

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google