Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining with AURA Jim Austin University of York & Cybula Ltd.

Similar presentations

Presentation on theme: "Data Mining with AURA Jim Austin University of York & Cybula Ltd."— Presentation transcript:

1 Data Mining with AURA Jim Austin University of York & Cybula Ltd

2 22 Oct Overview AURA Background to AURA Brief overview of its components Its implementation AURA within UK e-Science What is e-Science The DAME pilot project Use of AURA in DAME GRID issues in DM

3 22 Oct The AURA Technology Neural network based associative storage Set of tools to build fast pattern recognition systems Aimed at unstructured data Aimed at large datasets Scaleable technology

4 22 Oct AURA as a basis for search The game is to remove the chaff using AURA. Later processes find the exact match.

5 22 Oct The storage system Correlation Matrix Memory based Exploits threshold logic methods Uses distributed encoding of information Implemented using binary weights for efficient software and hardware implementation

6 22 Oct Threshold, T weights ( ) Inputs M P R

7 22 Oct Why is it fast? Access only rows that are activated by inputs. Inputs are made as sparse as possible and fixed weight. Only need to sum over active rows (bit vectors) – ideal for most processors Great for bit vector machines (DAP!).

8 22 Oct Use of the CMM Data CMM system Query Data subset Slow algorithm Final data

9 22 Oct CMM system Pre-process Operations Prepare data CMM system Post process

10 22 Oct Pre-processing Implements a number of pre-processors –N-grams for text strings –CMAC for numeric data –Graphs for images and graphics –Tokens for logical data –Quantisation for time series

11 22 Oct Post processing Data selected by the CMM must be accessed quickly. Uses best bit index method to match output data and recover stored data.

12 22 Oct Implementation The AURA C++ library Implemented on PC or workstation Beowulf parallel cluster Origin 2000 supercomputer Bespoke hardware

13 22 Oct AURA parallel implementation 28 dedicated PCI based processors Beowulf configuration 3.5Gb memory size Cortex-1

14 22 Oct UK eScience Aims to build on the concept of Grids –To make computing and data provision as direct and simple as electrical power delivery £110M initiative started 18 months ago DAME is a £3.5M pilot project to demonstrate its application in the engineering field.

15 22 Oct DAME Objectives DAME: Distributed Aircraft Maintenance Environment. Demonstrate diagnostic capability on the GRID Examine timeliness properties of the GRID Demonstrate on the RR Aeroengine diagnostic problem

16 22 Oct Rolls-Royce University of Oxford, Lionel Tarassenko. University of Leeds, Peter Dew, Alison McKay. York, J Austin, J McDermid, A Wellings. University of Sheffield, P Fleming. Rolls-Royce, Derby. Data Systems & Solutions. Cybula Ltd.

17 22 Oct Engine flight data Airline office Maintenance Centre European data center London Airport New York Airport American data center Grid Diagnostics centre

18 22 Oct Diagnostic issues The system must analyse and report –Novel engine operation –Identify any cause of events –Do this quickly Data –Large (many Tb)

19 22 Oct Data – Zmod plots

20 22 Oct How does AURA contribute Search technology for multi-media data Parallel pattern match engine based on neural networks. Built on Correlation Matrix Memories. High performance Beowulf and dedicated hardware implementations. Commercially sold by Cybula Ltd.

21 22 Oct Quote Novelty indication Data used to identify novelty Data reduction processes Features Data stores/ data warehouse Diagnostic station Engine data Data to be searched for Pattern match results Match requests AURA-G GRID Diagnosis

22 22 Oct Data sample DM coding CMM Matching previous events Simple example of processing chain

23 22 Oct Typical pre-processing DM coding (1 up and 0 down) Fast Preserves information Produces a binary vector Time Frequency

24 22 Oct AURA-G This is a Globus enabled AURA implementation. Developed under DAME Will be available end of 2002 for use in other problems.

25 22 Oct AURA-G Support of scalable pattern matching Supports distributed search, across multiple CMM engines at different sites OGSA compliant

26 22 Oct Grid Issues in Data Mining Data provenance Standards: –Data transparency independent of location –Managing DB/Data mining link in distributed system –OGSA DAI

27 22 Oct Conclusions AURA is a mature component for data search and retrieval Robust software and hardware implementation available Applications in e-Science for Grid applications underway

28 22 Oct Contacts Jim Austin Dept Computer Science, University of York, York, YO1O 5DD Cybula Ltd DAME :

Download ppt "Data Mining with AURA Jim Austin University of York & Cybula Ltd."

Similar presentations

Ads by Google