Presentation is loading. Please wait.

Presentation is loading. Please wait.

Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.

Similar presentations

Presentation on theme: "Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004."— Presentation transcript:

1 Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004

2 About … A form of geographical analysis Current topic of interest in GIS research (and database research and AI research) Finding hidden information in large collections of geographic data

3 This seminar Learning about a topic together Presenting to each other + interaction Added value by good examples: –for important concepts, algorithms –possibly self-thought of, or extended –referring to GIS data and issues (hence the GIS course prerequisite) Written assignment: joint survey

4 Material Book by Harvey Miller and Jiawei Han (editors): selected chapters Possibly: papers from conference proceedings Mostly provided by me

5 Weeks Week 36-46 Probably: –Not September 4 (this Thursday) –Not in week 40 (Sept. 29 & Oct. 2) –Not October 23 The above depending on participation!

6 Overview of Geographic Data Mining & Knowledge Discovery Chapter 1 of the book KDD: knowledge discovery in databases Data warehouses Data mining Geographic aspects of the above

7 Knowledge Discovery in Databases (KDD) Large databases contain interesting patterns: non-random properties and relationships that are: –valid (general enough to apply to new data) –novel (non-trivial and unexpected) –useful (leads to effective action: decision making or investigation) –ultimately understandable (simple, and interpretable by humans)

8 Knowledge Discovery in Databases (KDD) Because of quantity of data nowadays Because we want information, not data Because computing power allows it

9 KDD opposed to statistics Statistics –small and clean numeric database –scientifically sampled –specific questions in mind KDD: none of the above

10 KDD techniques Statistics Machine learning Pattern recognition Numeric search (?) Scientific visualization

11 Data warehouse Large repository of data For analytical processing (DB: transactional processing) Heterogeneous: different sources and formats (DB: homogeneous) Supports OLAP tools (OnLine Analytical Processing)

12 OLAP Example Measure of interest: sales Dimensions of interest: item, store, week (item, store, week)  money [quantity sold times price ]

13 OLAP Example 2-dim. aggregation: (item, store,. )  money Another 2-dim. aggregation: sales by store and by week 1-dim. aggregation: sales by week (all items and stores) Data cube: all 2 d possible aggregations, different types of summaries

14 KDD steps Data selection Data pre-processing Data enrichment Data reduction and projection Data mining Interpretation and reporting Presence of steps and order not fixed

15 KDD steps Data selection: which records, variables chosen?

16 KDD steps Data selection Data pre-processing: removing noise, duplicate records, handling missing data, …

17 KDD steps Data selection Data pre-processing Data enrichment: combining the selected data with external data

18 KDD steps Data selection Data pre-processing Data enrichment Data reduction and projection: reduction in number, reducing dimension

19 KDD steps Data selection Data pre-processing Data enrichment Data reduction and projection Data mining: uncovering information, interesting patterns

20 KDD steps Data selection Data pre-processing Data enrichment Data reduction and projection Data mining Interpretation and reporting: evaluating, understanding, communicating

21 Data mining Segmentation Dependency analysis Deviation and outlier analysis Trend detection Generalization and characterization

22 DM - segmentation Description: Clustering: finding a finite set of implicit classes Classification: mapping data items into pre-defined classes Techniques: Cluster analysis Bayesian classification Decision or classification trees Artificial neural networks

23 DM - segmentation clustering given classes classification

24 DM – dependency analysis Description: Finding rules to predict the value of some attribute based on other attributes Techniques: Bayesian networks Association rules (4, 12, 0.24) (3, 14, 0.21) (7, 13, 0.43) (2, 9, 0.78) (11, 11, 0.55) (5, 11, ???) (???, 12, 0.51)

25 DM – dependency analysis Confidence and support measures for association rules of the form: [ if X then Y ]: confidence = #(X and Y in DB) / #(X in DB) support = #(X and Y in DB) / #(all in DB)

26 DM – deviation & outlier analysis Description: Finding data with unusual deviations (=errors, or data of particular interest) Techniques: Clustering, other mining methods Outlier analysis

27 DM – trend detection Description: Finding lines, curves, summarizing the database (often as a function over time) Techniques: Regression Sequential pattern extraction

28 DM – generalization and characterization Description: Obtaining compact descriptions of the data Techniques: Summary rules Attribute-oriented induction concept hierarchy low level concept higher level concept

29 Visualization and knowledge discovery KDD is difficult to automate  steered by human intelligence Visualization helps to understand the data and which data mining techniques to try

30 KD + geography Special case of KDD Other special cases –marketing –biology –astronomy Main features: location, distance, dimen- sionality (with dependent dimensions)

31 KD + geography (attr1, attr2, attr3, attr4); attr’s are numbers and (relatively) independent: statistics (attr1, attr2, attr3, attr4); attr’s can also be on other measurement scales: KDD (attr1, attr2, attr3, attr4); attr’s are often dependent and can be shapes: KD + geography Often: (lat., long., attr1, attr2, …) or: (shape description, attr1, attr2, …)

32 KD + geography Study of scalable versions of DM tasks (in lat. and long.) Certain dimensions can be non-metric (travel time need not be symmetric) DM in data that is not in the form of tuples: sets of thematic map layers

33 Geographic data mining Spatial segmentation (clustering, classification) Spatial dependency (spatial association rules) Spatial trend detection Geographic characterization and generalization

34 GDM – spatial association rules Example: If a location is within 500 m from water and the average winter temperature is at least –2 degrees then there are frogs around distance relationship

35 GDM – spatial trend detection Patterns of change with respect to neighborhood of some object Example: (North America) Further from Pacific ocean  fewer earthquakes

36 GDM - applications Map interpretation Remote sensing interpretation Environmental mapping (soil type, etc.) Extracting spatio-temporal patterns for cyclones, crimes Spatial interaction (movement/flow of people, capital, goods)

37 Conclusions GDM & GKD is an extension of (tool for) geographical analysis GDM is different from DM due to –Geographic spaces, not attribute space –Neighborhood is extremely important –Scale issues –Data is different –Applications (interesting patterns to mine for) are different

38 This seminar on GDM First: chapters from the book –CH 1: GDM & KD: an overview (today) –CH 2: Paradigms for spatial and spatio-temporal DM(11-9) –CH 3: Fundamentals of spatial DW for GKD(15-9) –CH 7: Algorithms and applications of SDM (Ronny)(18-9) –CH 8: Spatial clustering in DM(22-9) – CH 6: Modeling spatial dependencies(25-9) (not: 29-9 and 2-10) –CH 9: Detecting outliers(6-10) –CH 10: Knowledge construction based on GVis and KDD –CH 14: Mining mobile trajectories

39 This seminar All PowerPoint presentations on the Web page of the course Survey paper or written exam; possible topics for survey: –Hierarchical clustering –Clustering with obstacles –Proximity relationship mining –… Or: joint survey of (geometric) algorithms for GDM

40 Each presentation The chapter contents Additional (spatial) examples (from the Web links or self-constructed) Detect and present algorithmic problems that appear  together: report on algorithmic issues in GDM Present your chapter; don’t be afraid of overlap with other chapters

Download ppt "Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004."

Similar presentations

Ads by Google