Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,

Similar presentations


Presentation on theme: "Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,"— Presentation transcript:

1 Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 1, Friday April 19 th, 2007 (basics and example applications)

2 Overview of this Lecture Machine Learning Basics –Classification –Objects as feature vectors –Regression –Clustering Example applications –Surface reconstruction –Preference Learning –Netflix challenge (how to earn $1,000,000) –Text search

3 Classification Given a set of points, each labeled + or – –learn something from them … –… in order to predict label of new points + + + + + + + + – – – – – – – – ? – this is an instance of supervised learning

4 Classification — Quality Which classifier is better? –answer requires a model of where the data comes from –and a measure of quality/accuracy + + + + + + + + – – – – – – – – ?

5 Classification — Outliers and Overfitting We have to find a balance between two extremes –oversimplification (  large classification error) –overfitting (  lack of regularity) –again: requires a model of the data + + + + + + + + – – – – – – – – + – – –

6 Classification — Point Transformation If a classifier does not work for the original data –try it on a transformation of the data –typically: make points linearly separable by a suitable mapping to a higher-dimensional space ++++++–––++ 0 + + + + + – – – + + + map x to (x, |x|)

7 Classification — more labels + + + + + + + + – – – – – – – – o o o o o o o Typically: –first, basic technique for binary classification –then, extension to more labels

8 Objects as Feature Vectors But why learn something about points ? General Idea: –represent objects as points in a space of fixed dimension –each dimension corresponds to a so-called feature of the object Very crucial: –selection of features –normalization of vectors

9 Objects as Feature Vectors Example: Objects with attributes –features = values –normalize by reference value for each feature Person 1Person 2 Person 3 188 cm181 cm190 cm 75 kg 90 kg77 kg age 36 age 32age 34 188 75 36 181 90 33 Person 4 176 cm 55 kg age 24 height weight age 190 77 34 172 55 34 1.04 0.94 0.90 1.01 1.13 0.83 height/180 weight/70 age/30 1.06 0.96 0.85 0.96 0.69 0.60

10 Objects as Feature Vectors 282 858 272 2 8 2 8 5 8 2 7 2 Example: Images –features = pixels (with grey values) –often fine without further normalization 161 666 161 Image 1 Image 2 pixel (1,1) pixel (1,2) pixel (1,3) pixel (2,1) pixel (2,2) pixel (2,3) pixel (3,1) pixel (3,2) pixel (3,3) 1 6 1 6 6 6 1 6 1

11 Objects as Feature Vectors Example: Text documents –features = words –normalize to unit norm 1 1 1 0 0 0 1 Learning Machine SS Statistical Theory 2006 2007 Doc. 1 Machine Learning SS 2007 Doc. 1 Machine Learning SS 2007 Doc. 2 Statistical Learning Theory SS 2007 Doc. 2 Statistical Learning Theory SS 2007 Doc. 3 Statistical Learning Theory SS 2006 Doc. 3 Statistical Learning Theory SS 2006 1 0 1 1 1 0 1 1 0 1 1 1 1 0

12 Objects as Feature Vectors Example: Text documents –features = words –normalize to unit norm 0.5 0 0 0 Learning Machine SS Statistical Theory 2006 2007 Doc. 1 Machine Learning SS 2007 Doc. 1 Machine Learning SS 2007 Doc. 2 Statistical Learning Theory SS 2007 Doc. 2 Statistical Learning Theory SS 2007 Doc. 3 Statistical Learning Theory SS 2006 Doc. 3 Statistical Learning Theory SS 2006 0.4 0 0 0 0

13 Regression Learn a function that maps objects to values Similar trade-off as for classification: –risk of oversimplification vs. risk of overfitting x x x x x ? x x x given value (typically multi-dimensional) value to learn (typically a real number)

14 Regression Learn a function that maps objects to values Similar trade-off as for classification: –risk of oversimplification vs. risk of overfitting x x x x x ? x x x given value (typically multi-dimensional) value to learn (typically a real number)

15 Clustering Partition given set of points into clusters Similar problems as for classification –follow data distribution, but not too closely –transformation often helps (next slide) x x xx x x x x x x x x this is an instance of unsupervised learning

16 Clustering Partition given set of points into clusters Similar problems as for classification –follow data distribution, but not too closely –transformation often helps (next slide) x x xx x x x x x x x x

17 Clustering Partition given set of points into clusters Similar problems as for classification –follow data distribution, but not too closely –transformation often helps (next slide) x x xx x x x x x x x x

18 Clustering Partition given set of points into clusters Similar problems as for classification –follow data distribution, but not too closely –transformation often helps (next slide) x x xx x x x x x x x x

19 Clustering — Transformation For clustering, typically dimension reduction helps –whereas in classification, embedding in a higher- dimensional space typically helps 10100 11000 11110 00011 internet web surfing beach vectors for documents 2, 3, and 4 equally dissimilar 0.90.8 0.0 -0.10.0 1.10.9 project to 2 dimensions 2-clustering would work fine now doc1doc2doc3doc4doc5

20

21 Application Example: Text Search 676 abstracts from the Max-Planck-Institute –for example: We present two theoretically interesting and empirically successful techniques for improving the linear programming approaches, namely graph transformation and local cuts, in the context of the Steiner problem. We show the impact of these techniques on the solution of the largest benchmark instances ever solved. –3283 words (words like and, or, this, … removed) –abstracts come from 5 working groups: Algorithms, Logic, Graphics, CompBio, Databases –reduce to 10 concepts No dictionary, no training, only the plain text itself !


Download ppt "Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,"

Similar presentations


Ads by Google