Finding Unusual Correlation Using Matrix Decompositions David Skillicorn School of Computing, Queen’s University Math and CS, Royal Military College

Slides:



Advertisements
Similar presentations
Detecting Statistical Interactions with Additive Groves of Trees
Advertisements

FMRI Methods Lecture 10 – Using natural stimuli. Reductionism Reducing complex things into simpler components Explaining the whole as a sum of its parts.
Pattern Recognition and Machine Learning
Applications of one-class classification
Selecting Suspicious Messages in Intercepted Communication David Skillicorn School of Computing, Queens University Research in Information Security, Kingston.
Beyond Keyword Filtering for Message and Conversation Detection David Skillicorn School of Computing, Queen’s University Math and CS, Royal Military College.
6.4 Best Approximation; Least Squares
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Brief introduction on Logistic Regression
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
PCA + SVD.
Developing a Characterization of Business Intelligence Workloads for Sizing New Database Systems Ted J. Wasserman (IBM Corp. / Queen’s University) Pat.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Fifth Workshop on Link Analysis, Counterterrorism, and Security. or Antonio Badia David Skillicorn.
HCI 530 : Seminar (HCI) Damian Schofield. HCI 530: Seminar (HCI) Transforms –Two Dimensional –Three Dimensional The Graphics Pipeline.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Principal Component Analysis
UNIVERSITY OF JYVÄSKYLÄ Yevgeniy Ivanchenko Yevgeniy Ivanchenko University of Jyväskylä
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Separate multivariate observations
Clustering Unsupervised learning Generating “classes”
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Chapter 2 Dimensionality Reduction. Linear Methods
Chapter 3 Data Exploration and Dimension Reduction 1.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
Data Mining & Knowledge Discovery Lecture: 2 Dr. Mohammad Abu Yousuf IIT, JU.
SINGULAR VALUE DECOMPOSITION (SVD)
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Technical Report of Web Mining Group Presented by: Mohsen Kamyar Ferdowsi University of Mashhad, WTLab.
CSE 185 Introduction to Computer Vision Face Recognition.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Algorithms 2005 Ramesh Hariharan. Algebraic Methods.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Some thoughts on error handling for FTIR retrievals Prepared by Stephen Wood and Brian Connor, NIWA with input and ideas from others...
Analyzing Expression Data: Clustering and Stats Chapter 16.
Principle Component Analysis and its use in MA clustering Lecture 12.
Lecture 8 Source detection NASSP Masters 5003S - Computational Astronomy
CLUSTERING HIGH-DIMENSIONAL DATA Elsayed Hemayed Data Mining Course.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Dense-Region Based Compact Data Cube
CSE 4705 Artificial Intelligence
PREDICT 422: Practical Machine Learning
Background on Classification
Introduction C.Eng 714 Spring 2010.
Principal Component Analysis (PCA)
Descriptive Statistics vs. Factor Analysis
OVERVIEW OF LINEAR MODELS
X.1 Principal component analysis
OVERVIEW OF LINEAR MODELS
Feature space tansformation methods
Fixed, Random and Mixed effects
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Data Pre-processing Lecture Notes for Chapter 2
Lecture 16. Classification (II): Practical Considerations
Marios Mattheakis and Pavlos Protopapas
Latent Semantic Analysis
Presentation transcript:

Finding Unusual Correlation Using Matrix Decompositions David Skillicorn School of Computing, Queen’s University Math and CS, Royal Military College

The problem: Detect the footprint of terrorist presence/action in large datasets. The solution: Look for differences between the correlative structure of terrorist group and other kinds of groups. The technique: Use matrix decompositions, particularly singular value decomposition and semidiscrete decomposition. Why it’s interesting: These techniques are good at detecting clusters, even when hidden inside other clusters.

THE PROBLEM

Background: There are many datasets that contain data that might reveal the existence and actions of terrorist groups: Echelon signal traffic, airline travel records, financial records, CCTV surveillance (e.g. London central traffic). There are ongoing issues around collection and access to such data (Taipale). Societal attitudes are not very consistent: collecting airline travel records is offensive, but Echelon is apparently not. The goal is not classification (these objects are OK, these objects are suspicious) but rather ranking – for further downstream analysis. At this early stage, false negatives are more of a problem than false positives.

Broadly, there are 3 analysis strategies: 1. Consider each object (a person, a message) individually and decide whether or not it is suspicious. This assumes that there are attributes that, by themselves, are enough to cause suspicion (e.g. CAPSS II “rooted in the community”). Countermeasures: make sure that measured attribute values are innocuous (probing); take on someone else’s attribute values (identity theft). In general, each object tries to hide in the noise by looking like every other object.

2. Look for sets of objects that are connected based on some of their attributes. Link Analysis, Social Network Analysis – “connecting the dots”. Criminal groups have different structure: unusually connected to each other. Terrorist groups are like this too, but also less connected to society at large. Countermeasures: conceal the connections between the objects by making sure they share no obvious attribute values: * decouple by using intermediaries * smear time factors e.g. by using web sites, message drops In general, a group tries to hide in the noise by looking like all the other groups, or not like a group at all.

3. Look for sets of objects that are connected in more subtle ways because of correlation among their properties. Workable countermeasures are hard to find because: * a group acts with a common purpose which creates commonalities in attribute values (but NOT necessarily equalities) * a group has a target and so is forced to acquire attribute values that have commonalities with the target Furthermore, an ordinary group tends to have highly varied correlation with other groups, and so seems much more diffuse. This approach recognizes the structures visible to link analysis, but can see other connections as well.

THE DATA

We consider artificial datasets of three kinds: 1. Gaussian distributed data to represent attributes such as locations, ages, etc. 2. Poisson distributed data to represent attributes such as actions during a specific time period (e.g. travel to a particular city in a particular week) 3. Sparse data, because many forms of data (e.g. travel, word usage) are naturally sparse

In this talk: 1000 rows, normally distributed in 30 dimensions (representing dense attribute data). 10 extra rows normally distributed, in each dimension, around a random row of the base dataset. This models correlation of the terrorist cell with a target. This does not reduce the generality of the model, since omitting the target row makes a very small change in the resulting models.

THE TECHNIQUES

Matrix decompositions. The basic idea: * Treat the dataset as a matrix, A, with n rows and m columns; * Factor A into the product of two matrices, C and F A = C F where C is n x r, F is r x m and r is smaller than m. Think of F as a set of underlying `real’ somethings and C as a way of `mixing’ these somethings together to get the observed attribute values. Choosing r smaller than m forces the decomposition to somehow represent the data more compactly. F A = C

Two matrix decompositions are useful : Singular value decomposition (SVD) – the rows of F are orthogonal axes such that the maximum possible variation in the data lies along the first axis; the maximum of what remains along the second, and so on. The rows of C are coordinates in this space. Semidiscrete decomposition (SDD) – expresses A as a sum of matrices with the same shape, each of which represents a rectilinearly aligned region of similar magnitude (+ve and –ve). SDD creates an unsupervised ternary classification tree for objects. (Other matrix decompositions, such as independent component analysis can also be useful.)

Regard each row of the data matrix as a point in m-dimensional space. SVD maps correlation  similarity  proximity. This can be exploited in a number of ways: 1. Denoising (much more to come at this meeting). Truncating the SVD after some k dimensions removes `noise’, particularly helpful when the background is Gaussian. BUT when the goal is to look for anomalies, this can be dangerous. 2. Spectral clustering (much more to come at this meeting). There are several principled ways to do this – but they give different answers.

3. Distance from the origin corresponds to `interestingness’. Consider the object corresponding to the white vector… positively correlated +ve dot product negatively correlated –ve dot product uncorrelated 0 dot product # objects >> # dimensions, so objects that are correlated with everything or with nothing must be close to the origin. Distance from the origin is a surrogate for interestingness.

points corresponding to uninteresting objects most interesting objects In some number of dimensions …

4. If a target is known, those points that are placed on the same side of the origin are those that are correlated with it. target

3. and 4. can be exploited together target Those points outside the yellow region are the least interesting.

5. The local neighborhoods of points that are in correlated groups are different from those of ordinary points. Choose an object; do an SVD; and discard the objects that are not correlated with it. Repeat with the remaining objects. RoundSize of sets still correlated when point is a targetis not a target

6. Weights can be added to either particular objects or particular attributes. This has the effect of moving them farther from the origin – but of pulling correlated points along with them. This may help to see structure hidden inside other structure. 7. There’s a very general way, due to Wedderburn, of forcing particular structure into a matrix decomposition. This can be used to discount existing knowledge so that the matrix decomposition looks only for new information. It’s hard to make use of this with artificial data.

SDD looks for regions based on their magnitude and the amount of the array they cover. Scaling the array values non-linearly tunes the techniques to look for small regions of large value or large regions of small value. SDD can be applied to: * the data matrix, A, directly * the object correlation matrix, A A’ * the correlation matrix A k A’ k obtained by truncating an SVD at k and remultiplying to get A k (this matrix represents the higher-order correlation present in A). The last algorithm is usually most powerful and is called the JSS methodology.

THE RESULTS

3d visualization of SVD, red boxes: terrorist group; red star: target. (labelling done externally)

504 objects greater than median distance from the origin (Distances computed in 10 dimensions) Remove uninteresting objects …

120 objects greater than 1.3 times the median distance from the origin

462 objects correlated with the target Remove uncorrelated objects …

226 objects correlated with the target and greater than median distance from the origin Remove both uninteresting and uncorrelated objects …

64 objects correlated with the target and greater than 1.3 times the median distance from the origin ~6% of the data, and all of the terrorist group still included.

Positions from SVD, labelling from SDD. Terrorist group have similar labels, but they can’t be picked out.

Position from SVD, labelling from SDD applied to the truncated correlation matrix. Now terrorist group is detected by the technique! N.B. the blue objects – potential aliases

These pictures are from an easy (but still not trivial) dataset, but the results extend to: * much more complex normally distributed data * Poisson distributed data (modeling e.g. travel patterns) * sparse data (up to 98% sparse) * binary data * binary sparse data although performance decreases as the amount of information in the data drops off.

Choosing appropriate attributes in this domain is hard. The dataset must contain some attributes for which the correlation among the terrorist group is different from the correlation in the general population. The number of such attributes required: * does not depend on the total number of attributes OR the total number of objects. * depends on how much smaller the variance within the terrorist group is than the variance of the whole population; For example, when the variance for the group is about half that of the general population, about 20 attributes are needed. For smaller variance, fewer attributes are needed. Adding attributes that might be useful does not hurt detection rates.

Summary: Detecting terrorist signatures in large datasets is difficult because they actively attempt to avoid leaving detectable traces. Correlation is the hardest form of signature to avoid because: * it’s forced by purpose, and * it’s hard to see `from the inside’, especially as an individual and so seems like the most fruitful direction for detection techniques. Matrix decompositions such as SVD and SDD are helpful in detecting and visualizing such correlation. However, they are best used as a triage mechanism, in preparation for more sophisticated downstream analysis techniques.

? Related material: D.B. Skillicorn, Detecting Related Message Traffic, Workshop on Link Analysis, Counterterrorism, and Privacy, SIAM International Conference on Data Mining 2004,39--48, April D.B. Skillicorn, Applying Matrix Decompositions to Counterterrorism, Queen's University, School of Computing, Technical Report , May 2004, 47pp. The Athens system for novel information discovery. Data mining in sensor networks with correlated values.

Watch for… Workshop on Link Analysis, Counter-terrorism, and Privacy, SIAM International Conference on Data Mining 2005, Sutton Place Hotel, Newport Beach, California, April 20-23, 2005 (submission date around Jan 2005).