Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Supporting further and higher education 19th APAN Meetings in Bangkok Innovative Uses of Pervasive Broadband Network Is adoption of technology running.
Scaling distributed search for diagnostics and prognostics applications Prof. Jim Austin Computer Science, University of York UK CEO Cybula Ltd.
Using search for engineering diagnostics and prognostics Jim Austin.
Jim Austin University of York & Cybula Ltd
Rolls-Royce supported University Technology Centre in Control and Systems Engineering UK e-Science DAME Project Alex Shenfield
Decision Support Tools CBR & Modeling Jeff Allan University of Sheffield.
Jim Austin, University of York Grid-based on-line aeroengine diagnostics.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Towards the Design and Implementation of the DAME prototype: OGSA Compliant Grid Services on the White Rose Grid Sarfraz A Nadeem University of Leeds.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
Grid Enabled Pattern Matching within the DAME e-Science Pilot Project Jim Austin Computer Science University of York.
CoLaB 22nd December 2005 Secure Access to Service-based Collaborative Workflow for DAME Duncan Russell Informatics Institute University of Leeds, UK.
1 CS 430: Information Discovery Lecture 4 Data Structures for Information Retrieval.
UK e-Science and the White Rose Grid Paul Townend Distributed Systems and Services Group Informatics Research Institute University of Leeds.
A Distributed Data Architecture Mark Jessop University of York.
DAME, EuroGrid WP3 and GEODISE Esa Nuutinen. Introduction Dame, EuroGrid WP3 and GEODISE All are Grid based tools for Engineers. Many times engineers.
Business Intelligence
L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.
AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University 3D Shape Classification Using Conformal Mapping In.
DAME: A Distributed Diagnostics Environment for Maintenance Professor Jim Austin/Dr Tom Jackson University of York.
Pattern Matching in DAME using AURA technology Jim Austin, Robert Davis, Bojian Liang, Andy Pasley University of York.
CS e-Science & Grid Computing - introduction - What is e-Science? What is the Grid? Grid middleware.
Microsoft Research Faculty Summit Paul Watson Professor of Computer Science Newcastle University, UK.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
1 1 Slide Introduction to Data Mining and Business Intelligence.
DAME: Distributed Engine Health Monitoring on the Grid
CBR for Fault Analysis in DAME Max Ong University of Sheffield.
CLOUD BASED MACHINE LEARNING APPROACHES FOR LEAKAGE ASSESSMENT AND MANAGEMENT IN SMART WATER NETWORKS Dr. Steve Mounce, Ms. Catalina Pedroza, Dr. Tom Jackson,
Using String Similarity Metrics for Terminology Recognition Jonathan Butters March 2008 LREC 2008 – Marrakech, Morocco.
DAME: The route to commercialisation Tom Jackson University of York.
UK e-Science DAME Project | UK DTI BROADEN Project All Hands Meeting19 th September 2006 Proxim-CBR: A Scalable Grid Service Network for Mobile Decision.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Distributed Aircraft Maintenance Environment - DAME DAME Workflow Advisor Max Ong University of Sheffield.
Max Ong University of Sheffield, UK. AHM 2004 Session 2.3: Workflow Composition, Wednesday 1 st September 2004, 4pm. Workflow Advisor in DAME Abstract.
DAME Dependability and Security Study: Progress Report Howard Chivers University of York Practical Security for e-Science Projects 25 November 2003.
The DAME project Professor Jim Austin University of York.
Performance Evaluation of a SNAP-based Community Resource Broker Mohammed H. Haji, Peter Dew, Karim Djemame and Iain Gourlay.
DAME: A Distributed Diagnostics Environment for Maintenance Duncan Russell University of Leeds.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
Efficient Local Statistical Analysis via Integral Histograms with Discrete Wavelet Transform Teng-Yok Lee & Han-Wei Shen IEEE SciVis ’13Uncertainty & Multivariate.
DAME: A Distributed Diagnostics Environment for Maintenance Dr Tom Jackson University of York.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
Overview of the DAME Project Distributed Aircraft Maintenance Environment University of York Martyn Fletcher.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
Access Control for Dynamic Virtual Organisations Duncan Russell, Peter Dew & Karim Djemame University of Leeds.
Why BI….? Most companies collect a large amount of data from their business operations. To keep track of that information, a business and would need to.
A Scalable Service Architecture for Distributed Search Mark Jessop University of York.
DataJewel 1 : Tightly Integrating Visualization with Temporal Data Mining Mihael Ankerst, David H. Jones, Anne Kao, Changzhou Wang 1 US patent pending.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
LOGO Cleaner, more efficient convential fuels. CR&D and Feasibility Competition Innovate UK Brokering Event 11 th February 2015 Manchester.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
SIMILARITY SEARCH The Metric Space Approach
MATLAB Distributed, and Other Toolboxes
National e-Infrastructure Vision
Query in Streaming Environment
Big Data - in Performance Engineering
Research Issues in Electronic Commerce
A survey of network anomaly detection techniques
Deep Neural Networks for Onboard Intelligence
Grid Application Model and Design and Implementation of Grid Services
Data Mining: Introduction
Presentation transcript:

Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York

Distributed Aircraft Maintenance Environment - DAME Overview Distributed Aircraft Maintenance Environment (DAME) project Vibration data and search problem AURA strategy Architecture and storage Demonstration using signal data explorer Future challenges

Distributed Aircraft Maintenance Environment - DAME Project Partners EPSRC Funded, £3.2 Million, 3 years, commenced Jan UK pilot project for e-Science 4 Universities: –University of York, Dept of Computer Science –University of Sheffield, Dept of Automatic Control and Systems Engineering –University of Oxford, Dept of Engineering Science –University of Leeds, School of Computing and School of Mechanical Engineering Industrial Partners: –Rolls-Royce –Data Systems and Solutions –Cybula Ltd

Distributed Aircraft Maintenance Environment - DAME Operational Scenario Engine flight data Airline office Maintenance Centre London Airport New York Airport GRID Diagnostics Centre Engine flight data European data centre Engine flight data Airline office Maintenance Centre London Airport New York Airport US data centre GRID Diagnostics Centre Engine flight data

Distributed Aircraft Maintenance Environment - DAME DAME Grid Challenges Building a demonstration system as proof of concept for Grid technology in the aerospace diagnostic domain Two primary Grid challenges: –Management of large, distributed and heterogeneous data repositories –Rapid data mining and analysis of fault data Other key (commercial) issues: –Remote, secure access to flight data and other operational data and resources –Management of distributed users and resources –Quality of Service issues (and Service Level Agreements )

Distributed Aircraft Maintenance Environment - DAME Demonstrator Fully operational system on the WRG –Demonstrated the basic system architecture and main services Maintenance Analyst Maintenance Engineer Toolbench

Distributed Aircraft Maintenance Environment - DAME White Rose Grid Distribution

Vibration data and search problem

Distributed Aircraft Maintenance Environment - DAME Z-mod Data Vibration data from sensors forms Z-mod data. Tracked orders extracted from Z-mod data Frequency Time Tracked order Time Amplitude

Distributed Aircraft Maintenance Environment - DAME Pattern Matching Problem Collected vibration data from all engines in flight Detect unusual events on recent flights –QUICK on wing statistical classifier system Search for similar events on other engines –Uses AURA pattern matching methods to search large vibration data sets Reason using historical data and search results –CBR tools which access service records

Distributed Aircraft Maintenance Environment - DAME Novelty Novelty or anomaly identified in tracked order data by QUICK Forms query sub- sequence

Distributed Aircraft Maintenance Environment - DAME Search Problem Search for sub-sequences similar to the query in a large volume of tracked order data. –Need to investigate all possible alignments –Benchmark method is sequential scan –Noisy data: imprecise matching required –Various possible similarity measures Euclidian distance Correlation

Distributed Aircraft Maintenance Environment - DAME AURA Family of generic techniques for pattern matching using Correlation Matrix Memories (CMMs) Proven technology for searching large data sets –Sclable high performance –Find exact and near-matches –Wide range of data types –Can be parallelised Operation –Takes vectors and compares them to stored examples –Uses bit level comparison methods and binary matrix operations.

Distributed Aircraft Maintenance Environment - DAME AURA Technology Data Data Adaptor Store Search Input pattern Candidate Engine (Back check) Indexer Output pattern AURA SearchEngine Results binary Store & Search Indexes or Data ResultStore AURA Search Process

Distributed Aircraft Maintenance Environment - DAME Data Storage & Recall Input pattern Output pattern AURA SearchEngine binary ** Correlation Matrix Memories

Distributed Aircraft Maintenance Environment - DAME AURA Encoding Application specific encoding required for efficient searching –Similarity metric –Integer bins –Reduction in dimensionality –Can integrate traditional methods

Distributed Aircraft Maintenance Environment - DAME Performance Fast method of discarding poor matches AURA search roughly 30x faster than sequential scan Candidate matches typically <1% of total Back check stage very efficient due to reduction in volume of data –Typically 1% or less of processing time for full sequential scan.

Distributed Aircraft Maintenance Environment - DAME Architecture and Storage Terabytes per year of raw zmod data –Access is required by many DAME services 1Tb per year of tracked orders that need to be searched against –Access required by Signal Data Explorer Observed in a distributed manner –Delivery to a central repository makes high bandwidth requirements

Distributed Aircraft Maintenance Environment - DAME Objectives Distributed search –Transparent Distribution of search and collation of results –Efficient Use of processing and communications resources –Extensible Permit addition / removal of resource –Concurrent Support multiple simultaneous searches

Distributed Aircraft Maintenance Environment - DAME Objectives Generic mechanisms –Suitable for different types of time series data and a variety of search methods Robust architecture –Graceful degradation when some components unavailable –Provision of intermediate results before all searching completed

Distributed Aircraft Maintenance Environment - DAME Design Pattern match controller (PMC) service –Controls distribution and collation of the search –Generic service –Simple interface –Minimal communications overheads Pattern matching service –Performs the search –Can be implemented in a variety of ways –Conforms to a simple API Storage resource broker (SRB) –Used to store and retrieve data and metadata –Provides a single logical view onto all stored data

Distributed Aircraft Maintenance Environment - DAME Physical Architecture Pattern Match Controller Pattern Matching Service SRB Search Node Pattern Match Controller Pattern Matching Service SRB Search Node Pattern Match Controller Pattern Matching Service SRB Search Node Pattern Match Controller Pattern Matching Service SRB Search Node Pattern Match Controller Pattern Matching Service SRB Search Node Pattern Match Controller Pattern Matching Service SRB Search Node Pattern Match Controller Pattern Matching Service SRB Search Node Pattern Match Controller Pattern Matching Service SRB Search Node Pattern Match Controller Pattern Matching Service SRB Search Node Pattern Match Controller Pattern Matching Service SRB Search Node Data Explorer Work Station Grid Computers on the Grid

Distributed Aircraft Maintenance Environment - DAME Heathrow Signal Data Explorer (Client Application) Master Node SRB Slave Node Gatwick Search(…) NodeSearch(…) PMC Pattern Match Engine SRB Search(…) PMC Pattern Match Engine Search Process

Distributed Aircraft Maintenance Environment - DAME Heathrow Signal Data Explorer (Client Application) Master Node SRB Slave Node Gatwick Pattern Match Engine SRB PMC Pattern Match Engine Search Process ReturnResults (…) GetResults(…) PMC

Distributed Aircraft Maintenance Environment - DAME Implementation PMC –Java GT3 Grid service –Hosted within a Tomcat installation Pattern matching service –Communicates with PMC using proprietary encoding –Uses SRB client library to access data Storage resource broker (SRB) –SRB server running at all WRG sites –Single metadata catalogue (MCAT) hosted at York

Distributed Aircraft Maintenance Environment - DAME APIs Client developers need only use simple store(), search() and getResults() API calls. Pattern matching service developers need only implement a simple interface of search() and store(), and use the returnResults() API call.

Distributed Aircraft Maintenance Environment - DAME Summary Transparent –PMC distributes search to multiple pattern matching services. –Results collated and returned to Data Explorer Efficient –PMC has minimal overheads –SRB handles used to identify results – minimal communications bandwidth required for search

Distributed Aircraft Maintenance Environment - DAME Summary Extensible –PMC uses a distributed catalogue of other PMC locations Permits simple addition/removal of search nodes Concurrent –PMC uses unique search ids based on master PMC id –Results kept for a time to allow access from other workstations

Distributed Aircraft Maintenance Environment - DAME Summary Generic mechanisms –PMC interface independent of type of time series data searched or algorithms used –Generic SRB handles used to identify data to search and results Robust architecture –High availability as clients may use any PMC node as master for a search. –Temporary results built up and may be accessed before entire search complete –Partial results in event of unavailable nodes –Automatic clean-up after timeout

Distributed Aircraft Maintenance Environment - DAME Signal Data Explorer Tool to allow investigation of data outside of an automatic workflow by a domain expert Accesses local data stores or remote (distributed) data sets and searching services. For this demo, searches against data held on the White Rose Grid at York, Leeds and Sheffield

Distributed Aircraft Maintenance Environment - DAME Next Steps Scaling trials on engine data –Realistic number of concurrent users supported –Investigate performance as the number of nodes and/or volume of data is increased Compare overhead in search time / network requirements to a centralised architecture Federated MCATs –Create several SRB zones each with a metadata catalogue

Distributed Aircraft Maintenance Environment - DAME Complex Search Requests Search Requests –May contain several query patterns to be matched against. The results for one pattern may constrain the search space for another query pattern –If treated as several individual queries may not be processed efficiently A result may consist of data stored at more than one node –Slave nodes may be required to issue sub-searches to other nodes in the system