Machine Learning ICS 178 Instructor: Max Welling visualization & k nearest neighbors.

Machine Learning ICS 178 Instructor: Max Welling visualization & k nearest neighbors

Types of Learning Supervised Learning Labels are provided, there is a strong learning signal. e.g. classification, regression. Semi-supervised Learning. Only part of the data have labels. e.g. a child growing up. Reinforcement learning. The learning signal is a (scalar) reward and may come with a delay. e.g. trying to learn to play chess, a mouse in a maze. Unsupervised learning There is no direct learning signal. We are simply trying to find structure in data. e.g. clustering, dimensionality reduction.

Ingredients Data: what kind of data do we have? Prior assumptions: what do we know a priori about the problem? Representation: How do we represent the data? Model / Hypothesis space: What hypotheses are we willing to entertain to explain the data? Feedback / learning signal: what kind of learning signal do we have (delayed, labels)? Learning algorithm: How do we update the model (or set of hypothesis) from feedback? Evaluation: How well did we do, should we change the model?

Data Preprocessing Before you start modeling the data, you want to have a look at it to get a “feel”. What are the “modalities” of the data: e.g. Netflix: users and movies Text: words-tokens and documents Video: pixels, frames, color-index (R,G,B) What is the domain? Netflix: rating-values [1,2,3,4,5,?] Text: # times a word appears: [0,1,2,3,...] Video: brightness value: [0,..,255] or real-valued. Are there missing data-entries? Are there outliers in the data? (perhaps a typo?)

Data Preprocessing Often it is a good idea to compute the mean and variance of the data. Mean gives you a sense of location, Variance/STD a sense of scale. Better even is to histogram the data: Tricky issue: how do you choose the bin-size: too small: you see noise, too big: it’s one clump. meanvariance standard deviation

Preprocessing For netflix you can histogram this for both modalities: The rating distribution over users for a movie. The rating distribution over movies for a user. The rating distribution over users for all movies jointly. The rating distribution over all movies for all users jointly. You can compute properties and plot them against each other. For example: Compute the the user-specific mean variance over movies and plot a scatter plot: user-mean user-variance every dot is a different user

Scatter-Plots This shows all the 2-D projections of the “Iris data”. Color indicates the class of iris. How many attributes do we have for Iris?

3-D visualization contour plot meshgrid plot

Embeddings Every red dot represents an image. An image has +/- 1000 pixels Each image is projected to a 2-D space Projections are such that similar images are projected to similar locations in the 2-D embedding. This gives us an idea how the data is organized. These plots are produced by “local linear embedding” http://www.cs.toronto.edu/~roweis/lle/

Embeddings

Visualization by Clustering By performing a clustering of the data and looking at the cluster-prototypes you can get an idea of the type of data.

Preprocessing Often it is useful to “standardize” (or “whiten”) the data before you start modeling. The idea is to remove the mean and the variance so that your algorithm can focus on more sophisticated (higher order) structure.

Be Creative! WEKA DEMO

Machine Learning ICS 178 Instructor: Max Welling visualization & k nearest neighbors.

Similar presentations

Presentation on theme: "Machine Learning ICS 178 Instructor: Max Welling visualization & k nearest neighbors."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning ICS 178 Instructor: Max Welling visualization & k nearest neighbors.

Similar presentations

Presentation on theme: "Machine Learning ICS 178 Instructor: Max Welling visualization & k nearest neighbors."— Presentation transcript:

Similar presentations

About project

Feedback