Ernestina Menasalvas Ruiz Pedro Sousa. GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour.

Slides:



Advertisements
Similar presentations
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Advertisements

10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
1 Chapter 34 Data Mining Transparencies © Pearson Education Limited 1995, 2005.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Clementine Server Clementine Server A data mining software for business solution.
Data Mining By Archana Ketkar.
Introduction. 1.Data Mining and Knowledge Discovery 2.Data Mining Methods 3.Supervised Learning 4.Unsupervised Learning 5.Other Learning Paradigms 6.Introduction.
Data Mining – Intro.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Chapter 35 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Dr. Awad Khalil Computer Science Department AUC
Data Mining Techniques
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Understanding Data Analytics and Data Mining Introduction.
Introduction: The essential background
Introduction To Data Mining. What Is Data Mining? A toolA tool Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful)
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Data Mining Copyright KEYSOFT Solutions.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Chapter 2: Data Mining Dr. Goutam Sarker,
Introduction to Machine Learning, its potential usage in network area,
Data Mining Transparencies
IOT – Firefighting Example
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
School of Computer Science & Engineering
A Research Oriented Study Report By :- Akash Saxena
Sangeeta Devadiga CS 157B, Spring 2007
Data Warehousing and Data Mining
Prepared by: Mahmoud Rafeek Al-Farra
Data Mining: Introduction
Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make.
CSE591: Data Mining by H. Liu
Presentation transcript:

Ernestina Menasalvas Ruiz Pedro Sousa

GOAL Extract knowledge from aviation data sources to obtain patterns that help detection of incidents Learn behaviour models

What is Data Mining? Many Definitions – Non-trivial extraction of implicit, previously unknown and potentially useful information from data – Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ KDD process

…. CRISP-DM ( Busines Understanding Data Understanding Data Preparation ModelEvaluate ARSS fleet

Challenges Data integration Aircraft information Context: sensors, space weather, location, weather Operations: pre-flight, departure, climb, enroute, arrival, taxing, post-flight Aviation safety reports Dynamic and complex data: – theoretical and practical aspects of the algorithms have to be analyzed to discover the most appropriate techniques: trend analysis, association of events, datastream methods, context integration, resource awareness

GOAL (cont) apply algorithms to mine the various data sources for information – to identify patterns: atypical flights, anomalous cockpit procedures Groups of safety reports BUT: – KDD is a process Static vs dynamic

KDD process Aprox. 80% effort

Data Exploration and transformation Exploration of the data to better understand its characteristics. – Helping to select the right tool for preprocessing or analysis – Making use of humans’ abilities to recognize patterns – Integrate semantic of data – Clustering and anomaly detection will be used as exploratory techniques Transform data prior to mining so to be able to extract the useful patterns

Data Mining Tasks Prediction (Supervised learning) – Use some historical information to learn a model that can help to predict unknown or future values of some variable. – Base for forecasting Classification Regression Deviation Detection Description (Unsupervised) – Find patterns that describe the data – Clustering – Association Rule Discovery – Sequential Pattern Discovery

Classification Given a collection of records in which the class is known: – Find a model able to describe the class given values of the rest of attributes. Measurements have to be used to validate the model and determine accuracy of prediction – Train and test Techniques – Induction tree C4.5, ID3 Very effcients if we look at the execution time Very intuitive results – Neural networks The result is a neural network: black box Robust No intuitive

Regresion analysis Lineal regression: Y =  +  X –  and  specify the line and are estimated using the data. Multiple regression: Y = b0 + b1 X1 + b2 X2. Log-linear models: – The table of joint probabilities is aproximated by tables of inferior orders.

Clustering Given a set of records (unclassified), group records in such a way that: – records in one cluster are more similar to one another. – records in separate clusters are less similar to one another. Similarity Measures have to be defined: – Special attention to distance understanding Approaches – Divisive Algorithms: They first build different partitions and then these partitions are evaluated: K-means – Hierarchical: They build a hierarchical descomposition – Density based: density functions are used – Kohonen networks [Kohonen ‘95]

Association Rule Discovery Given a set of records described by a set of attributes: – Find associations in values of attributes – Once associations are discovered, rules can be obtained – Confidence vs support. – Apriori Algoritm At1=1 and At3=1 and At4=1

Association algorithms The problem of association rule finding can be divided in two: – Find the set of products that have the minimum support – Use the frequent itemsets to generate the rules Apriori [Agrawal ‘93] – Advantages: Apriori and its variants are the most used in this kind of analysis. Eficient in great volumes of data. – Disadvantages: Memory comsumtion

Challenges of the algorithms Algorithm to find anomalies in large dataset : – be fast – scalable. – Accurate Algorithms have to be able to deal with: – continuous sequences, representing sensor data such as airspeed and altitude – discrete sequences, such as sequences of pilot switch presses.

Data streams vs static data

Data streams Challenges into algorithms: -Processing data in a single pass. -Generation models in an incremental way. -Ability to detect model changes over time. -Limit usage of memory and computing time. -Possibility of automating the evaluation process. A data stream: -is potentially unbound in size -needs to be analyzed over time -arrives at very high rate -and its undelying model evolves over time [Aggarwal et al.] “Data Streams: Models and Algorithms”. Advances in Database Systems, Springer, 2007 [Aguilar-Ruiz, Gama] “Data Streams”. Journal of UniversalComputer Science, 2005 [Barbará] “Requirements for clustering data streams”. SIGKDD’02.

Goal New challenges introduced by evolving data like: – resource aware learning, – change detection, – novelty detection – important application areas where data evolution must be taken into account – how learning under constraints (time, storage capacity and other resources) is affected by data evolution – how context can help learning process

Change and concept drift [Joao Gama 2010] Concept drift: the underlying concept may shift unexpectedly from time to time. Changes appear: Adversary actions Varying personal interest Changing population Complex environment

Required features Examples have to be processed as they arrive Each example should be processed: – Small constant time – Fixed amount of main memory – Single scan of the data – Without (or reduced) revisit old records. Produce models equivalent to the one that would be obtained by a batch data-mining algorithm Detect and react to concept drift [Joao Gama 2010]

Recurrent concepts Many learning algorithms to deal with concept drift – Based on: time windows, ensembles, drift detection. – FLORA, SEA, DWM, DMM,... What about Recurrent concepts? – Particular type of concept drift. – Fogetting mechanisms, past data and models are discarded. – However, its common for concepts to reappear.

Context and data stream

Context Context representation: Context similarity: numeric: nominal:

Context integration We want to integrate context information with previously learned models. freqC is the most frequent Context in a sequence of context states {C1, C2,... Cn} Concept history with associated context. h(M k |C i ) Estimate that M k represents the current underlying concept given the current context.

Model Storage Model storage for a model M k : the period k where the model was used. using NB requires storing the CV the frequent context freqC for period k. accuracy of the model when it was in use. Represented as the tuple:

Model Retrieval Model retrieval for a model M k : – using a sample S n of recent records, – compute the MSE for M k – get the freqC for S n – use history h(M k |freqC) The utility is defined based on model accuracy (highest) and with context similar (min distance) to the current one. Retrieve the model with highest utility as:

CALDS: learning process Incrementally Learn the underlying concept When warning is signaled: Prepare a new base learner for the possible new concept Anticipate to drift When drift is detected: Store the current model Reuse a previously learned model when the underlying concept is recurrent.

CALDS: learning process

Improvements integrating context Overall accuracy: 72.5 %; 69,6%; 62,2%

SOME ALREADY PREVIOUS EXPERIENCE

Other current applications ESA- European Space Agency – Event Reporting Tool for non-manned satellite passes (Cryosat monitoring) 31

current applications ESA- European Space Agency / Galileo Industries – Galileo - Ground Control Segment Central Monitoring & Control Facility 32

Some current applications Portuguese Navy – Singrar – Integrated System for Ship Repair and Resource allocation 33

The process 34 Integrated Risk Plans Activation / Maintenance Drillings Training Application Input

Space Weather

Why – Space Weather? To protect systems and people that might be at risk from space weather effects, we need to understand the causes of space weather.

Space Weather Decision Support System SWDSS Third project financed by the European Space Agency (ESA) about SW SWDSS main objective is to develop software capable of storing, manipulating and reacting to adverse Space Weather situations in spacecrafts:. Providing tools for analyzing the collected data;. Supplying reporting facilities for systems management;. Supplying a knowledge discovery tool for nowcast, forecast and data mining.

Data sources and providers Mission’s telemetry (payload and/or housekeeping) data and processed data Mission’s auxiliary data, e.g. orbital coordinates, apogee and perigee crossings, station coverage and hand-over, events, 3D models, metadata Data available from other sources, e.g. NOAA, SIDC, SWENET, National Agencies Data from ground-based measurements

Satellite Monitoring

Conclusion Huge amount of aviation data 1.Integrate data (micro and macro level) 2.Enrich data with semantics 3.Map data with technique to discover patterns (static and streams) : 1.Anomalities 2.predictive 3.Sequences 4.Context influence Data mining in other similar domains has obtained results Next step: data mining for aviation safety

Ernestina Menasalvas Ruiz Pedro Sousa