Machine Learning for the Quantified Self

Slides:



Advertisements
Similar presentations
CLUSTERING.
Advertisements

Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Clustering II.
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Albert Gatt Corpora and Statistical Methods Lecture 13.
Clustering Categorical Data The Case of Quran Verses
PARTITIONAL CLUSTERING
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
Clustering and Dimensionality Reduction Brendan and Yifang April
Clustering II.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Announcements Take home quiz given out Thursday 10/23 –Due 10/30.
Cluster Analysis (1).
© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.
Birch: An efficient data clustering method for very large databases
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Clustering Unsupervised learning Generating “classes”
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
BIRCH: An Efficient Data Clustering Method for Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny University of Wisconsin-Maciison Presented.
CHAPTER 1: Introduction. 2 Why “Learn”? Machine learning is programming computers to optimize a performance criterion using example data or past experience.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Clustering Data Streams A presentation by George Toderici.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Clustering (1) Clustering Similarity measure Hierarchical clustering
Clustering Anna Reithmeir Data Mining Proseminar 2017
Chapter 15 – Cluster Analysis
CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Machine Learning and Data Mining Clustering
Data Mining -Cluster Analysis. What is a clustering ? Clustering is the process of grouping data into classes, or clusters, so that objects within a cluster.
Computer Vision Lecture 12: Image Segmentation II
Clustering (3) Center-based algorithms Fuzzy k-means
BIRCH: An Efficient Data Clustering Method for Very Large Databases
CS 685: Special Topics in Data Mining Jinze Liu
Clustering Evaluation The EM Algorithm
K-means and Hierarchical Clustering
CSE 4705 Artificial Intelligence
Hierarchical clustering approaches for high-throughput data
CSE572, CBS598: Data Mining by H. Liu
Clustering and Multidimensional Scaling
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
CSE572, CBS572: Data Mining by H. Liu
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
Data Mining – Chapter 4 Cluster Analysis Part 2
Machine Learning in Practice Lecture 23
Clustering Wei Wang.
Cluster Analysis.
Text Categorization Berlin Chen 2003 Reference:
Clustering Techniques
Group 9 – Data Mining: Data
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies
CSE572: Data Mining by H. Liu
Hierarchical Clustering
CS 685: Special Topics in Data Mining Jinze Liu
Introduction to Machine learning
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
Presentation transcript:

Machine Learning for the Quantified Self Chapter 5 Clustering

Overview Previously: we identified features Today: let us use the features to cluster Learning setup Distance functions Clustering algorithms

Learning setup (1) Per instance: x1, qs1 x1, qs1 x2, qs1 xN, qs1 xN, qsn x2, qsn x2, qsn x1, qsn x2, qsn xN, qsn

Learning setup (2) Per person: x1, qs1 x2, qs1 xN, qs1 x1, qsn x2, qsn xN, qsn

Distance metrics (1) Need different distance metrics per scenario Individual distance metrics (between an instance xi and xj over all attributes p): Euclidean distance Manhattan distance

Individual distance metrics (1) The difference:

Individual distance metrics (2) Generalized form: Important to consider: scaling of data These all assume numeric values, what if we have other types? We can use Gower’s similarity

Individual distance metrics (3) Gower’s similarity for different types of features: Dichotomous attributes (present or not) Categorical attributes: Numerical attributes:

Individual distance metrics (4) And we computer the final similarity as:

Distance metrics (2) What if we need to compare datasets instead of individual instance? We can identify person level distance metrics These are available in different forms: For datasets without an explicit ordering For datasets with a temporal ordering

Non-temporal personal level distance metrics (1) We have a number of options: Summarize values per attribute i over the entire dataset into a single value e.g., mean, minimum, maximum, ….. use the same distance metrics we have seen before Estimate parameters of distribution per attributes i and compare those(e.g. N(μ,σ2)) e.g. normal distribution N(μ,σ2) use the same distance metrics we have seen before upon the found parameter values Compare the distributions of values for an attribute i with a statistical test e.g. use a Kolmogorov Smirnov test, resulting in a p-value Take 1-p as a distance metric

Non-temporal personal level distance metrics (2)

Temporal personal level distance metrics (1) What if we do have an ordering? Raw-data based Feature-based Model-based

Raw-data based distances (1) Simplest case: assume an equal number of points and compute the Euclidean distance on a per point basis:

Raw-data based distances (2) What if the time series are more or less the same, but shifted in time? We can use a lag to compensate, we denote the amount of time shifted by τ The cross correlation coefficient between two dataset is computed by:

Raw-data based distances (3) What is the best value τ to shift? This is an optimization problem (note: one shift over all attributes):

Raw-data based distances (4) Next to this, we can also think about different frequencies at which different persons perform their activities We can use dynamic time warping for this purpose We try to make the best pairs of instances in sequences to find the minimum distance

Dynamic Time Warping (1) Pairing?

Dynamic Time Warping (2) Requirements of pairing Time order should be preserved monotonicity condition First and last points should be matched boundary condition assume seq(k, qsj) to represent the sequence number (in terms of instance ) of pair k with respect to the dataset qsj

Dynamic Time Warping (3) Requirements of pairing monotonicity condition boundary condition

Dynamic Time Warping (4) boundary condition monotonicity condition boundary condition monotonicity condition

Dynamic Time Warping (5) How do we find the best path? We start at (0,0) and move up and to the right Per pair we compute the cheapest way to get there given our constraints and the distance between the instances in the pair As distance between instance we can for example use the Euclidean distance Let us look at the algorithm…..

Dynamic Time Warping (6)

Dynamic Time Warping (7) Computationally expensive Improvement: Keogh bound Would you be willing to allow for all matches provided they meet the conditions? Would the best match always be the best distance metric?

Temporal personal level distance metrics (2) What if we do have an ordering? Raw-data based v Feature-based Same as non-temporal case we have seen Model-based Fit a time series model and use those parameters (again in line with the non-temporal case, except for the type of model being different)

Clustering approaches Ok, we know how to define a distance, now what clustering approaches do we have? Assume you do have some basic understanding Will scan through the options briefly k-means k-medoids hierarchical clustering

k-means clustering (1)

k-means clustering (2)

Performance metric How do we select the best value for k? How do we evaluate whether a clustering is good? Silhouette can be used ([-1, 1], 1 is best)

k-means clustering (3) CrowdSignals accelerometer data

k-means clustering (4)

k-means clustering (5) CrowdSignals accelerometer data

k-means clustering (6)

k-medoids clustering (1) k-medoids: same as k-means, except that we do not use artificial mean centers, but actual points k-means: suitable for person level distance metrics? k-medoids is known to be more suitable

k-medoids clustering (2)

Hierarchical clustering (1) Can also take a more iterative approach: Divisive clustering Start with one cluster, make a split in each step Agglomerative clustering Start with one cluster per instance and merge Can be visualized using a dendrogram

Hierarchical clustering (2)

Agglomerative clustering (1) We need to know which clusters to merge Different criteria: Single linkage Complete linkage Group average

Agglomerative clustering (2) We need to know which clusters to merge Different criteria: Ward’s criterion (minimize standard deviation):

Agglomerative clustering (3)

Agglomerative clustering (4) CrowdSignal example (Ward’s criterion)

Divisive clustering (1) We start with a big cluster and split We define a dissimilarity of a point to other points in the cluster C: We create a new cluster C’ and remove the most dissimilar points from C to C’ until these are more dissimilar to points in C’ What C should we select? One with the largest diameter:

Divisive clustering (2)

Subspace clustering (1) What if we have a huge number of features? Would everything even out? Very likely Most clustering approaches do no handle a large number of features well Subspace clustering can provide a solution: Looks at subspace in the feature space Focus on the CLIQUE algorithm (cf. Agrawal, 2005)

Subspace clustering (2) First step: we create units based on our features (ε distinct intervals per feature)

Subspace clustering (3) A unit is defined by upper and lower boundaries per feature: ui(l) is the lower bound, ui(h) the upper bound for the value of attribute i An instance x is part of the unit when its value fall in the boundaries:

Subspace clustering (4) We can define the selectivity of a unit as: And given a threshold τ a unit is dense when: Previous figure showed dense units (grey)

Subspace clustering (5) We want subspaces (subsets of attributes), so our units do not have to cover all attributes We can have units that cover k attributes Units have a common face when:

Subspace clustering (6) Units are connected in a cluster when they they share a common face or when they share a unit that is a common face to both: We strive to find dense units using a combination of attributes to form the clusters

Subspace clustering (7) We start with one attribute and work our way up using some fancy algorithms Once we have found all the dense units we find the clusters The details on the precise computational steps are provided in the book