Presentation is loading. Please wait.

Presentation is loading. Please wait.

Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based.

Similar presentations


Presentation on theme: "Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based."— Presentation transcript:

1 Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based examples HW0 (due in one week) Getting ‘Labeled’ Training Examples Train/Tune/Test Sets N-fold Cross Validation 9/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 11

2 The Big AI Picture – Chapter 2 9/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 12 Environment AI “Agent” 1: Sense 5: Learn 3: Act 4: Get Feedback 2: Reason The study of ‘agents’ that exist in an environment and perceive, act, and learn

3 What Do You Think Machine Learning Means? Given: Do: 9/8/15 Throughout the semester, think of what is missing in current ML, compared to human learning 3CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 1

4 What is Learning? “Learning denotes changes in the system that … enable the system to do the same task … more effectively the next time.” - Herbert Simon “Learning is making useful changes in our minds.” - Marvin Minsky 9/8/154CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 1 But remember, cheese and wine get better over time but don’t learn!

5 9/8/15 Supervised Machine Learning: Task Overview Concepts/ Classes/ Decisions Concepts/ Classes/ Decisions Feature “Design” (usually done by humans) Classifier Construction (done by learning algorithm) Real World Feature Space 5CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 1

6 9/8/15 Standard Approach for Constructing an ML Dataset for a Task Step 1: Choose a feature space We will use fixed-length feature vectors Choose N features Each feature has V i possible values Each example is represented by a vector of N feature values (is a point in the feature space) eg color weight shape Step 2: Collect examples (“I/O” pairs) Defines a space color shape weight 6CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 1

7 Another View of Std ML Datasets - a Single Table (2D array) 9/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 17 Feature 1 Feature 2... Feature N Output Category Example 10.0smallred true Example 29.3mediumredfalse Example 38.2smallbluefalse... Example M5.7mediumgreentrue

8 9/8/15 Standard Feature Types for representing training examples – a source of “domain knowledge” Nominal (including Boolean) –No ordering among possible values eg, color  {red, blue, green} (vs. color = 1000 Hertz) Linear (or Ordered) –Possible values of the feature are totally ordered eg, size  {small, medium, large} ← discrete weight  [0…500] ← continuous Hierarchical (not commonly used) –Possible values are partially ordered in an ISA hierarchy eg, shape  closed polygoncontinuous trianglesquarecircleellipse Keep your eye out for places where domain knowledge is (or should be) used in ML 8CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 1

9 9/8/15 A Richer Testbed: The Internet Movie Database (IMDB) IMDB richly represents data note each movie is potentially represented by a graph of a different size by a graph of a different size 9CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 1 Figure from David Jensen of UMass

10 Learning with Data in Multiple Tables (Relational ML) – not covered in cs540 Previous Mammograms Previous Blood Tests Prev. Rx Key challenge different amount of data for each patient Patients 10

11 HWOHWO – Reading in an Dataset Due in one week (most HWs will have two weeks between when assigned and when due) The Thoracic Surgery Dataset (original version)The Thoracic Surgery Dataset original version 9/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 111

12 Getting Labeled Examples The ‘Achilles Heel’ of ML Often ‘experts’ label –eg ‘books I like’ or ‘patients that should get drug X’ ‘Time will tell’ concepts –wait a month and see if medical treatment worked or stock appreciated over a year Use of Amazon Mechanical Turk –‘the crowd’ Need representative examples, especially good ‘negative’ (counter) examples 9/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 112

13 If it is Free, You are the Product 9/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 113 Google is using authentication (as a human) as a way to get labeled data for their ML algorithms!

14 9/8/15 IID and Other Assumptions We are assuming examples are IID: independently identically distributed We are ignoring temporal dependencies (covered in time-series learning) We assume the ML algo has no say in which examples it gets (covered in active learning) Data arrives in any order 14CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 1

15 9/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 1 Train/Tune/Test Sets: A Pictorial Overview generate solutions select best ML Algo training examples train’ set tune set testing examples classifier expected accuracy on future examples collection of classified examples (here each column is an example) 15

16 9/8/15 N -fold Cross Validation Can be used to 1) estimate future accuracy (via test sets) 2) choose parameter settings (via tuning sets) Method 1) Randomly permute examples 2) Divide into N bins 3) Train on N - 1 bins, measure accuracy on bin ‘left out’ 4) Compute average accuracy on held-out sets Examples Fold 1 Fold 2Fold 3Fold 4 Fold 5 CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 116

17 Dealing with Data that Comes from Larger Objects Assume examples are sentences contained in books Or web pages from computer science depts Or short DNA sequences from genes (Usually) need to cross validate on the LARGER objects Eg, first partition books into N folds, then collect sentences from a fold’s books 9/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 2, Week 117 Sentences in Books Fold1Fold2


Download ppt "Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based."

Similar presentations


Ads by Google