Ganapathy Mani, Bharat Bhargava, Jason Kobes*

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
parity bit is 1: data should have an odd number of 1's
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Presented by Xinyu Chang
Data mining in wireless sensor networks based on artificial neural-networks algorithms Authors: Andrea Kulakov and Danco Davcev Presentation by: Niyati.
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.
Associative Learning in Hierarchical Self Organizing Learning Arrays Janusz A. Starzyk, Zhen Zhu, and Yue Li School of Electrical Engineering and Computer.
Error Correcting Codes To detect and correct errors Adding redundancy to the original message Crucial when it’s impossible to resend the message (interplanetary.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Predicting Matchability - CVPR 2014 Paper -
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Exercise Session 10 – Image Categorization
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Distributed computing using Projective Geometry: Decoding of Error correcting codes Nachiket Gajare, Hrishikesh Sharma and Prof. Sachin Patkar IIT Bombay.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Distributed Representative Reading Group. Research Highlights 1Support vector machines can robustly decode semantic information from EEG and MEG 2Multivariate.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Deep Questions without Deep Understanding
381 Self Organization Map Learning without Examples.
Data Mining and Decision Support
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Project 3 SIFT Matching by Binary SIFT
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
Introduction to Machine Learning, its potential usage in network area,
Naifan Zhuang, Jun Ye, Kien A. Hua
parity bit is 1: data should have an odd number of 1's
Machine Learning overview Chapter 18, 21
Chilimbi, et al. (2014) Microsoft Research
Efficient Image Classification on Vertically Decomposed Data
Learning Mid-Level Features For Recognition
Supervised Time Series Pattern Discovery through Local Importance
Multimodal Learning with Deep Boltzmann Machines
Presenter: Chu-Song Chen
Introductory Seminar on Research: Fall 2017
Convolutional Networks
AI in Cyber-security: Examples of Algorithms & Techniques
Project Implementation for ITCS4122
Machine Learning Models to Enhance the Science of Cognitive Autonomy
By: Kevin Yu Ph.D. in Computer Engineering
Efficient Image Classification on Vertically Decomposed Data
CAP 5636 – Advanced Artificial Intelligence
Chapter 17 Parallel Processing
Deep Learning Hierarchical Representations for Image Steganalysis
Autonomous Aggregate Data Analytics in Untrusted Cloud
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Neuro-Computing Lecture 4 Radial Basis Function Network
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
Text Categorization Assigning documents to a fixed set of categories
Northrop Grumman Cybersecurity Research Consortium (NGCRC)
Creating Data Representations
Ying Dai Faculty of software and information science,
Papers 15/08.
Analysis of Trained CNN (Receptive Field & Weights of Network)
Lecture 4. Niching and Speciation (1)
parity bit is 1: data should have an odd number of 1's
Toward a Great Class Project: Discussion of Stoianov & Zorzi’s Numerosity Model Psych 209 – 2019 Feb 14, 2019.
Modeling IDS using hybrid intelligent systems
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Neural Machine Translation using CNN
Tschandl P1,2, Argenziano G3, Razmara M4, Yap J4
Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision.
Presentation transcript:

Scalable Deep Learning Through Fuzzy-based Clustering in Autonomous Systems Ganapathy Mani, Bharat Bhargava, Jason Kobes* CS & CERIAS, Purdue University *Northrop Grumman Corporation AIKE 2018 1

Intelligent Autonomous Systems Autonomous Systems should be Able to perform complex tasks without or with limited ongoing connection to humans. Cognitive enough to act without a human’s judgment lapses or execution inadequacies. Intelligent Autonomous Systems (IAS) are characterized as highly Cognitive, effective in Knowledge Discovery, Reflexive, and Trusted. The focus of this research will be on the smart cyber systems. 2

Comprehensive IAS Architecture Adaptive action Anomaly Detection 3

Motivation Autonomous systems receive continuous streams of diverse data from numerous sources. Disregarding new and unknown data or broadly classifying them into few categories would cause an inadequate learning environment. IAS should be trained to work with Meta-data, limited data, incomplete data, and unknown (new) data Dynamic, unpredictable, and adversarial environment 4

Scalable Learning It’s a method to achieve maximum classification without rejecting any unknown data item that was not present in the training or testing datasets as anomalies. 5

Bitwise Fuzzy-based Clustering (BFC) BFC is implemented through Perfect Error-correcting codes or Golay codes. Error Correcting Codes (ECC) are used for controlling errors in data (any information that could be represented in bits 0/1). When there is an error in data, the error correcting codes can approximately match the distorted data to the original. For example, take the message (m) = 000. Consider 1 bit distortions of m: 100, 010, 001. All three distortions are 1 hamming distance (it takes 1 bit flip to get to 000) away from 000. So they can be easily corrected. 6

Why Error Correcting Codes? BFC creates clusters based on fuzzily (approximately) matched data items similar to error correction. For example, take the message (m) = 000 as a data item. m’s 1 bit distortions (100, 010, and 001) will be clustered into one. 0/1 bits are used to label binary features. Assume that 0 – Absent and 1 – Present. Based on number of features of a data item, we can create a binary classification. For example, data item D has 3 features. Presence or absence of each feature creates a code word, say, 101. Code word such as 101 will be a label for that data item. Using ECC provides scalability (2n combinations of clusters) and fault tolerance (distorted labels can be clustered correctly). 7

Perfect Error Correction Codes - BFC Code word = 000 Additional bits for error correction = 00 (Hash index) Message = 0 000 111 Code word = 111 Additional bits for error correction = 11 (Hash index) Message = 1 8

Perfect Error Correction Codes - BFC 000 Hamming Distance between two code words = 3 That is you need to flip 3 bits to go from 000 to 111. 111 (No. of bits in, ) code word = (3, 9

Perfect Error Correction Codes - BFC 010 Correction vector. Majority 2/3 0 approximation. 000 100 001 Hamming Distance between two code words = 3 101 011 110 111 (No. bits in, message size in, ) code word = (3, 1, 10

Perfect Error Correction Codes - BFC 010 Correction vector. Majority 2/3 0 approximation. 000 100 001 Hamming Distance between two code words = 3 101 011 110 111 [No. bits (Mn), message size (Dk), hamming distance (Hd)] code word = [3, 1, 3] 11

Perfect Error Correction Codes - BFC 010 Correction vector. Majority 2/3 0 approximation. 000 100 001 Hamming Distance between two code words = 3 101 011 110 111 Perfect simplest hamming code [Mn, Dk, Hd] = [3, 1, 3] Code is perfect when all the correction vectors are used. 12

Perfect Error Correction Codes - BFC 010 Correction vector. Majority 2/3 0 approximation. 000 100 001 Hamming Distance between two code words = 3 101 011 110 111 When there is a 1 bit distortion, say, 010, it can be correctly decoded by going 1 hamming distance 1 bit errors can be corrected 13

Perfect Error Correction Codes - BFC 010 Correction vector. Majority 2/3 0 approximation. 000 100 001 Hamming Distance between two code words = 3 101 011 110 111 Similar to (3, 1, 3) code, (23, 12, 7) is a perfect code. We use it for BFC with 223 combinations of features! 14

Perfect (23, 12, 7) Code - BFC The 3-dimensional cube of (3, 1, 3) code becomes a 23-dimentional binary hypercube for (23, 12, 7), which looks like a sphere given below. Each point is a corner of hypercube structure and has 7 connections to its 3-bit distortions. 15

Creating Indexes by Hashing - BFC Take an example of (7, 4, 3) coding: 4 is the size of the message and 24 = 16 messages are possible. Hash index is created by adding 3 random bits at the end. For example, 1100 is a message. 1100101 is the code word (4 message bits + 3 hash bits = 7 bits). Similarly, BFC hash indexes are created by 12 bits message + 11 bits hashing tags (total of 23 bits). This 23-bit binary vector, notating whether a feature of a data item is present or absent, will be considered a label for clustering those data items. 16

Why Clustering? Learning through sampling can only generalize learning. The new and small number of data items may be ignored. Clustering can group even a small number of data items. 17

Clustering by BFC – (3, 1, 3) Example 010 Correction vector. Majority 2/3 0 approximation. 000 100 001 101 011 (3, 1, 3) Cluster 010 Sub Cluster 110 111 D = 1 100 000 3 bit 000 code word is the center Similarly, BFC will have 12 bit centers 18 001

Clustering by BFC – (23, 12, 7) In (23, 12, 7) BFC model, a binary vector template is created. Based on the presence and absence of it, 0 or 1 is encoded. Consider an image data item: Just like (3, 1, 3) clustering, the first 12 bits in 23-bit vector will be the center of the sphere and other distortions in remaining 11-bits are fuzzily matched with the cluster. Features YES/NO F1: is it red? F2: is it female? 1 . F22: is it tall? F23: is it animal? 19

Clustering by BFC 20

Clustering by BFC We simulated BFC with 222 = 4,194,304 for the vector representation (223 can provide 8,388,608 vectors). The hashing creation and fuzzy matching depends on comparison operation where each bit in 23-bit vector is randomly initiated and clustered. We take the following into account for performance evaluation: Recall Clock cycles required for computing hashes (encoding). 21

Evaluation – Recall Recall = (relevant items ∩ retrieved items) / relevant items For higher hamming distances, the recall probabilities are small. 1—1 means direct matching with 1 hamming distance cluster with the same hamming distance. 22

Evaluation – CPU Performance Number of clock cycles for encoding. We used process thread API to collect data, sampled per 1000k instructions. 23

Evaluation – Time Complexity Number of clock cycles for encoding. We used process thread API to collect data, sampled per 1000k instructions. 24

Future Work Using Deep learning methodologies such as Convolutional Neural Networks (CNN) to further classify data stored in the clusters and enhance learning. Clustered data is transformed into n*23 dimensional matrices. Convoluted features are extracted (sampling and sliding window techniques can also be used). CNN will be performed to extract further insights from the classified data. 25

Future Work 26

Thank you!!! References: 27