Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Methods for Data Analysis Massimo Poesio INTRO TO MACHINE LEARNING.

Similar presentations


Presentation on theme: "Computational Methods for Data Analysis Massimo Poesio INTRO TO MACHINE LEARNING."— Presentation transcript:

1 Computational Methods for Data Analysis Massimo Poesio INTRO TO MACHINE LEARNING

2 WHAT IS LEARNING Memorizing something Learning facts through observation and exploration Developing motor and/or cognitive skills through practice Organizing new knowledge into general, effective representations

3 Machine Learning - Grew out of work in AI -Solve problems where rules can’t be written by hand MACHINE LEARNING

4

5 SPAM

6 Machine Learning - Grew out of work in AI - Solve problems where rules can’t be written by hand Examples: - Database mining Large datasets from growth of automation/web. E.g., Web click data, medical records, biology, engineering - Applications can’t program by hand. E.g., Autonomous helicopter, handwriting recognition, most of Natural Language Processing (NLP), Computer Vision. MACHINE LEARNING

7 Machine Learning - Grew out of work in AI - New capability for computers Examples: - Database mining Large datasets from growth of automation/web. E.g., Web click data, medical records, biology, engineering - Applications can’t program by hand. E.g., Autonomous helicopter, handwriting recognition, most of Natural Language Processing (NLP), Computer Vision. - Self-customizing programs E.g., Amazon, Netflix product recommendations - Understanding human learning (brain, real AI).

8 Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed. Tom Mitchell (1998) Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. Machine Learning definition

9 Classifying emails as spam or not spam. Watching you label emails as spam or not spam. The number (or fraction) of emails correctly classified as spam/not spam. None of the above—this is not a machine learning problem. Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting? “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”

10 Classifying emails as spam or not spam. Watching you label emails as spam or not spam. The number (or fraction) of emails correctly classified as spam/not spam. None of the above—this is not a machine learning problem. Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting? “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”

11 Classifying emails as spam or not spam. Watching you label emails as spam or not spam. The number (or fraction) of emails correctly classified as spam/not spam. None of the above—this is not a machine learning problem. Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting? “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”

12 A SPATIAL VIEW LEARNING Learning to discriminate between spam and non-spam can be pictured as learning how to discriminate between different types of objects in a space

13 A SPATIAL VIEW OF LEARNING The task of the learner is to learn a function that divides the space of examples into black and red

14 EXAMPLE: SPAM SPAM NON-SPAM

15 A MORE DIFFICULT EXAMPLE

16 ONE SOLUTION

17 ANOTHER SOLUTION

18 LEARNING A FUNCTION Given a set of input / output pairs, find a function that does a good job of expressing the relationship: – Wordsense disambiguation as a function from words (the input) to their senses (the outputs) – Categorizing email messages as a function from emails to their category (spam, useful) – A checker playing strategy a function from moves to their values (winning, losing)

19 WAYS OF LEARNING A FUNCTION SUPERVISED: given a set of example input / output pairs, find a rule that does a good job of predicting the output associated with an input UNSUPERVISED learning or CLUSTERING: given a set of examples, but no labelling, group the examples into “natural” clusters REINFORCEMENT LEARNING: an agent interacting with the world makes observations, takes action, and is rewarded or punished; the agent learns to take action in order to maximize reward

20 Machine learning algorithms: -Supervised learning -Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms.

21 Machine learning algorithms: -Supervised learning -Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms.

22 Supervised Learning

23 EXAMPLE: HOUSING PRICE PREDICTION Price ($) in 1000’s Size in feet 2 Regression: Predict continuous valued output (price) Supervised Learning “right answers” given

24 Housing price prediction. Price ($) in 1000’s Size in feet 2 Regression: Predict continuous valued output (price) Supervised Learning “right answers” given

25 Spam, non-spam Classification Discrete valued output (0 or 1) Spam? 1(Y) 0(N) Length of message (words)

26 Breast cancer (malignant, benign) Classification Discrete valued output (0 or 1) Malignant? 1(Y) 0(N) Tumor Size

27 Length of message From: -Occurrence of word ‘Nigeria’ -Occurrence of word ‘million dollars’ -…

28 Tumor Size Age -Clump Thickness -Uniformity of Cell Size -Uniformity of Cell Shape …

29 Treat both as classification problems. Treat problem 1 as a classification problem, problem 2 as a regression problem. Treat problem 1 as a regression problem, problem 2 as a classification problem. Treat both as regression problems. You’re running a company, and you want to develop learning algorithms to address each of two problems. Problem 1: You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months. Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it has been hacked/compromised. Should you treat these as classification or as regression problems?

30 Treat both as classification problems. Treat problem 1 as a classification problem, problem 2 as a regression problem. Treat problem 1 as a regression problem, problem 2 as a classification problem. Treat both as regression problems. You’re running a company, and you want to develop learning algorithms to address each of two problems. Problem 1: You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months. Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it has been hacked/compromised. Should you treat these as classification or as regression problems?

31 Unsupervised Learning

32 x1x1 x2x2 Supervised Learning

33 Unsupervised Learning x1x1 x2x2

34 x1x1 x2x2

35

36

37

38

39 [Source: Daphne Koller] Genes Individuals

40 [Source: Daphne Koller] Genes Individuals

41 Organize computing clusters Social network analysis Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Astronomical data analysis Market segmentation

42 Organize computing clusters Social network analysis Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Astronomical data analysis Market segmentation

43 Of the following examples, which would you address using an unsupervised learning algorithm? (Check all that apply.) Given a database of customer data, automatically discover market segments and group customers into different market segments. Given email labeled as spam/not spam, learn a spam filter. Given a set of news articles found on the web, group them into set of articles about the same story. Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not.

44 44 History of Machine Learning 1950s – Samuel’s checker player – Selfridge’s Pandemonium 1960s: – Neural networks: Perceptron – Pattern recognition – Learning in the limit theory – Minsky and Papert prove limitations of Perceptron 1970s: – Symbolic concept induction – Winston’s arch learner – Expert systems and the knowledge acquisition bottleneck – Quinlan’s ID3 – Michalski’s AQ and soybean diagnosis – Scientific discovery with BACON – Mathematical discovery with AM

45 45 History of Machine Learning (cont.) 1980s: – Advanced decision tree and rule learning – Explanation-based Learning (EBL) – Learning and planning and problem solving – Utility problem – Analogy – Cognitive architectures – Resurgence of neural networks (connectionism, backpropagation) – Valiant’s PAC Learning Theory – Focus on experimental methodology 1990s – Data mining – Adaptive software agents and web applications – Text learning – Reinforcement learning (RL) – Inductive Logic Programming (ILP) – Ensembles: Bagging, Boosting, and Stacking – Bayes Net learning

46 46 History of Machine Learning (cont.) 2000s – Support vector machines – Kernel methods – Graphical models – Statistical relational learning – Transfer learning – Sequence labeling – Collective classification and structured outputs – Computer Systems Applications Compilers Debugging Graphics Security (intrusion, virus, and worm detection) – Email management – Personalized assistants that learn – Learning in robotics and vision

47 READINGS English: – T. Mitchell, Machine Learning, Mc-Graw Hill, ch.1 Italian: – R. Basili & A. Moschitti, Apprendimento automatico, in F. Bianchini et al, Instrumentum vocale

48 THANKS I used materials from – Andrew Ng’s Coursera course at Stanford – Ray Mooney’s ML course at Utexas


Download ppt "Computational Methods for Data Analysis Massimo Poesio INTRO TO MACHINE LEARNING."

Similar presentations


Ads by Google