Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Queens College Lecture 1: Introduction.

Similar presentations


Presentation on theme: "Machine Learning Queens College Lecture 1: Introduction."— Presentation transcript:

1 Machine Learning Queens College Lecture 1: Introduction

2 Today Welcome Overview of Machine Learning Class Mechanics Syllabus Review 1

3 My research and background Speech –Analysis of Intonation –Segmentation Natural Language Processing –Computational Linguistics Evaluation Measures All of this research relies heavily on Machine Learning 2

4 You Why are you taking this class? What is your background and comfort with –Calculus –Linear Algebra –Probability and Statistics What is your programming language of preference? –C++, java, or python are preferred 3

5 Machine Learning Automatically identifying patterns in data Automatically making decisions based on data Hypothesis: 4 Data Learning Algorithm Behavior Data Programmer or Expert Behavior ≥

6 Machine Learning in Computer Science 5 Machine Learning Biomedical/Cheme dical Informatics Biomedical/Cheme dical Informatics Financial Modeling Natural Language Processing Speech/Au dio Processing Planning Locomotion Vision/Imag e Processing Robotics Human Computer Interaction Analytics

7 Major Tasks Regression –Predict a numerical value from “other information” Classification –Predict a categorical value Clustering –Identify groups of similar entities Evaluation 6

8 Feature Representations How do we view data? 7 Entity in the World Web Page User Behavior Speech or Audio Data Vision Wine People Etc. Feature Representation Machine Learning Algorithm Feature Extraction Our Focus

9 Feature Representations HeightWeightEye ColorGender 66170BlueMale 73210BrownMale 72165GreenMale 70180BlueMale 74185BrownMale 68155GreenMale 65150BlueFemale 64120BrownFemale 63125GreenFemale 67140BlueFemale 68165BrownFemale 66130GreenFemale 8

10 Classification Identify which of N classes a data point, x, belongs to. x is a column vector of features. 9 OR

11 Target Values In supervised approaches, in addition to a data point, x, we will also have access to a target value, t. 10 Goal of Classification Identify a function y, such that y(x) = t

12 Feature Representations HeightWeightEye ColorGender 66170BlueMale 73210BrownMale 72165GreenMale 70180BlueMale 74185BrownMale 68155GreenMale 65150BlueFemale 64120BrownFemale 63125GreenFemale 67140BlueFemale 68165BrownFemale 66130GreenFemale 11

13 Graphical Example of Classification 12

14 Graphical Example of Classification 13 ?

15 Graphical Example of Classification 14 ?

16 Graphical Example of Classification 15

17 Graphical Example of Classification 16

18 Graphical Example of Classification 17

19 Decision Boundaries 18

20 Regression Regression is a supervised machine learning task. –So a target value, t, is given. Classification: nominal t Regression: continuous t 19 Goal of Classification Identify a function y, such that y(x) = t

21 Differences between Classification and Regression Similar goals: Identify y(x) = t. What are the differences? –The form of the function, y (naturally). –Evaluation Root Mean Squared Error Absolute Value Error Classification Error Maximum Likelihood –Evaluation drives the optimization operation that learns the function, y. 20

22 Graphical Example of Regression 21 ?

23 Graphical Example of Regression 22

24 Graphical Example of Regression 23

25 Clustering Clustering is an unsupervised learning task. –There is no target value to shoot for. Identify groups of “similar” data points, that are “dissimilar” from others. Partition the data into groups (clusters) that satisfy these constraints 1.Points in the same cluster should be similar. 2.Points in different clusters should be dissimilar. 24

26 Graphical Example of Clustering 25

27 Graphical Example of Clustering 26

28 Graphical Example of Clustering 27

29 Mechanisms of Machine Learning Statistical Estimation –Numerical Optimization –Theoretical Optimization Feature Manipulation Similarity Measures 28

30 Mathematical Necessities Probability Statistics Calculus –Vector Calculus Linear Algebra Is this a Math course in disguise? 29

31 Why do we need so much math? Probability Density Functions allow the evaluation of how likely a data point is under a model. –Want to identify good PDFs. (calculus) –Want to evaluate against a known PDF. (algebra) 30

32 Gaussian Distributions We use Gaussian Distributions all over the place. 31

33 Gaussian Distributions We use Gaussian Distributions all over the place. 32

34 Data Data Data “There’s no data like more data” All machine learning techniques rely on the availability of data to learn from. There is an ever increasing amount of data being generated, but it’s not always easy to process. –UCI http://archive.ics.uci.edu/ml/ –LDC (Linguistic Data Consortium) http://www.ldc.upenn.edu/ –Contact me for speech data. Is all data equal? 33

35 Class Structure and Policies Course website: –http://eniac.cs.qc.cuny.edu/andrew/ml/syllabus.htmlhttp://eniac.cs.qc.cuny.edu/andrew/ml/syllabus.html Email list –CUNY First has an email function – most students do not use the associated email address… –Put your email address on the sign up sheet. 34


Download ppt "Machine Learning Queens College Lecture 1: Introduction."

Similar presentations


Ads by Google