Structural Knowledge Discovery Used to Analyze Earthquake Activity Jesus A. Gonzalez Lawrence B. Holder Diane J. Cook.

Slides:



Advertisements
Similar presentations
Observing Children in School. Aims To learn how to observe children in one of their natural habitats To understand how children experience their worlds.
Advertisements

GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING by Istvan Jonyer, Lawrence B. Holder and Diane J. Cook The University of Texas at Arlington.
Applications of knowledge discovery to molecular biology: Identifying structural regularities in proteins Shaobing Su Supervisor: Dr. Lawrence B. Holder.
INFORMATION MURAL A technique for displaying and navigating large information spaces Dean F. Jerding and John T. Stasko Graphics, Visualization, and Usability.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
CONNECTIVITY “The connectivity of a network may be defined as the degree of completeness of the links between nodes” (Robinson and Bamford, 1978).
Seismo-Surfer a tool for collecting, querying, and mining seismic data Yannis Theodoridis University of Piraeus
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Identifying Patterns in Road Networks Topographic Data and Maps Henri Lahtinen Arto Majoinen.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
FLAIRS '991 Applying the SUBDUE Substructure Discovery System to the Chemical Toxicity Domain Ravindra N. Chittimoori, Diane J. Cook, Lawrence B. Holder.
Structural Web Search Using a Graph-Based Discovery System Nitish Manocha, Diane J. Cook, and Lawrence B. Holder University of Texas at Arlington
Discovering Substructures in Chemical Toxicity Domain Masters Project Defense by Ravindra Nath Chittimoori Committee: DR. Lawrence B. Holder, DR. Diane.
Spatial and Temporal Data Mining V. Megalooikonomou Introduction to Decision Trees ( based on notes by Jiawei Han and Micheline Kamber and on notes by.
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Graph-Based Concept Learning Jesus A. Gonzalez, Lawrence B. Holder, and Diane J. Cook Department of Computer Science and Engineering University of Texas.
Graph-Based Data Mining Diane J. Cook University of Texas at Arlington
Data Mining.
FLAIRS Graph-Based Concept Learning Jesus Gonzalez, Lawrence Holder and Diane Cook Department of Computer Science and Engineering The University.
Subdue Graph Visualizer by Gayathri Sampath, M.S. (CSE) University of Texas at Arlington.
Detecting and Tracking of Mesoscale Oceanic Features in the Miami Isopycnic Circulation Ocean Model. Ramprasad Balasubramanian, Amit Tandon*, Bin John,
GUI implementation for Supervised and Unsupervised SUBDUE System.
Graph-based Learning and Discovery Diane J. Cook University of Texas at Arlington
Workshop1 Efficient Mining of Graph-Based Data Jesus Gonzalez, Istvan Jonyer, Larry Holder and Diane Cook University of Texas at Arlington Department.
Data Mining – Intro.
Part I: Classification and Bayesian Learning
MAPS AND CARTOGRAPHY What is a map? What is Cartography?
Evaluating Remote Sensing Data Or How to Avoid Making Great Discoveries by Misinterpreting Data Richard Kleidman ARSET-AQ Applied Remote Sensing Education.
Studying Earthquakes. Seismology: the study of earthquakes and seismic waves.
Themes and Elements of Geography
Data Mining Techniques
Data Mining Solutions (Westphal & Blaxton, 1998) Dr. K. Palaniappan Dept. of Computer Engineering & Computer Science, UMC.
Presentation On Shapes Basic Summary: On Shapes By Priyank Shah.
1 SUBSTRUCTURE DISCOVERY IN REAL WORLD SPATIO-TEMPORAL DOMAINS Jesus A. Gonzalez Supervisor:Dr. Lawrence B. Holder Committee:Dr. Diane J. Cook Dr. Lynn.
Web.ics.purdue.edu/~braile Online Searches
Chapter 1 Introduction to Data Mining
Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
CHAPTER ONE The Scientific Method. Section 1: What is Science?  Science:  a way of learning more about the natural world.  questions about art, politics,
Chapter 3 Digital Representation of Geographic Data.
AI Week 14 Machine Learning: Introduction to Data Mining Lee McCluskey, room 3/10
Tables tables are rows (across) and columns (down) common format in spreadsheets multiple tables linked together create a relational database entity equals.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Introduction. Spatial sampling. Spatial interpolation. Spatial autocorrelation Measure.
National Council of Teachers of Mathematics Principles and Standards for grades pre-K-2.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery.
Real Time Collaboration and Sharing
Geometry 1 st Grade Geometry Lesson. Content Strand Addressed Geometry Strand Students will: use visualization and spatial reasoning to analyze characteristics.
Web-based Data Mining for Quenching Data Analysis Aparna S. Varde, Makiko Takahashi, Mohammed Maniruzzaman, Richard D. Sisson Jr. Center for Heat Treating.
What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial data”
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
Improving Parallelism in Structural Data Mining Min Cai, Istvan Jonyer, Marcin Paprzycki Computer Science Department, Oklahoma State University, Stillwater,
Cluster Analysis This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed under a Creative Commons.
Computer Graphics: An Introduction
Semantic Visualization
Supervised Time Series Pattern Discovery through Local Importance
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
CRMarchaeo Modelling Context, Stratigraphic Unit, Excavated Matter
Spatial Databases - Introduction
Geography 413/613 Lecturer: John Masich
Database Systems Instructor Name: Lecture-3.
Los mecanismos focales de los terremotos
Spatial Databases - Introduction
Area of triangle.
Generalization Abstraction And Method
Journal #72 Draw a picture of an earthquake (lithosphere) label the focus, epicenter and fault.
Presentation transcript:

Structural Knowledge Discovery Used to Analyze Earthquake Activity Jesus A. Gonzalez Lawrence B. Holder Diane J. Cook

MOTIVATION AND GOAL l Need to analyze large amounts of information in real world databases. l Information that standard tools can not detect. l Earthquake Database. l Previous knowledge: Spatio-Temporal relations.

SUBDUE KNOWLEDGE DISCOVERY SYSTEM l SUBDUE discovers patterns (substructures) in structural data sets. l SUBDUE represents data as a labeled graph. l Inputs: Vertices and Edges. l Outputs: Discovered patterns and instances.

EXAMPLE object triangle object square on shape Vertices: objects or attributes Edges: relationships 4 instances of

EVALUATION CRITERION l Minimum Encoding. l Graph Compression. l Substructure Size (Tried but did not work).

EVALUATION CRITERION MINIMUM DESCRIPTION LENGTH l Minimum Description Length (MDL) principle. The best theory to describe a set of data is the one that minimizes the DL of the entire data set. l DL of the graph: the number of bits necessary to completely describe the graph. l Search for the substructure that results in the maximum compression.

THE EARTHQUAKE DATABASE l Several catalogs. l Sources like the National Geophysical Data Center. l Each record with 35 fields describing the earthquake characteristics.

THE EARTHQUAKE DATABASE KNOWLEDGE REPRESENTATION

THE EARTHQUAKE DATABASE PRIOR KNOWLEDGE l Connections between events where its epicenters were close to each other in distance (<= 75 kilometers). l Connections between events that happened close to each other in time (<= 36 hours). l Spatio-Temporal relations represented with “near_in_distance” and “near_in_time” edges.

l Geologist Dr. Burke Burkart. l Study of seismology caused by the Orizaba Fault. l Fault: A fracture in a surface where a displacement of rocks also happened. l Selection of the area of study, two squares: l First Longitude 94.0W through 101.0W and Latitude 17.0N through 18.0N. l Second Longitude 94.0W through 98.0W and Latitude 18.0N through 19.0N. DETERMINING EARTHQUAKE ACTIVITY

l Area of Study

DETERMINING EARTHQUAKE ACTIVITY l Divide the area in 44 rectangles of one half of a degree in both longitude and latitude. l Sample the earthquake activity in each sub-area. l Run Subdue in each sub-area.

DETERMINING EARTHQUAKE ACTIVITY

l Substructure 1 (with 19 instances) and substructure 2 (with 8 instances) found in sub-area 26.

DETERMINING EARTHQUAKE ACTIVITY l This pattern might give us information about the cause of the earthquakes. l Subduction also affects this area but it affects at a specific depth according to the closeness to the Pacific Ocean.

SUBDUE’S POTENTIAL l Subdue finds not only shared characteristics of events, but also space relations between them. l Dr. Burke Burkart is studying the patterns to give direction to this research. l Expect to find patterns representing parts of the paths of the involved fault. l Time relations not considered by Subdue. l Earthquake’s characteristics. l Important for other areas.

CONCLUSION l Subdue successful in real world databases. l Subdue used prior knowledge to guide search with temporal and spatial relations. l Subdue discovered interesting patterns using these temporal and spatial relations. l Subdue is being used as the data mining tool to study the “Orizaba Fault” in Mexico.

FUTURE WORK l Concept Learning Subdue l Theoretical analysis. l Bounds on complexity (e.g. PAC learning). l Graphic User Interface to visualize substructures and their instances.