Presentation is loading. Please wait.

Presentation is loading. Please wait.

Short Introduction to Machine Learning Instructor: Rada Mihalcea.

Similar presentations


Presentation on theme: "Short Introduction to Machine Learning Instructor: Rada Mihalcea."— Presentation transcript:

1 Short Introduction to Machine Learning Instructor: Rada Mihalcea

2 Slide 1 Learning? What can we learn from here? If Sky=Sunny and Air Temperature = Warm  Enjoy Sport = Yes If Sky=Sunny  Enjoy Sport = Yes If Air Temperature = Warm  Enjoy Sport = Yes If Sky=Sunny and Air Temperature = Warm and Wind = Strong  Enjoy Sport = Yes ??

3 Slide 1 What is machine learning? (H.Simon) “Any process by which a system improves performance” (T.Mitchell) “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” Machine Learning has to do with designing computer programs that improve their performance through experience

4 Slide 1 Related areas Artificial intelligence Probability and statistics Computational complexity theory Information theory Human language technology

5 Slide 1 Applications of ML Learning to recognize spoken words SPHINX (Lee 1989) Learning to drive an autonomous vehicle ALVINN (Pomerleau 1989) Learning to classify celestial objects (Fayyad et al 1995) Learning to play world-class backgammon TD-GAMMON (Tesauro 1992) Learning to translate between languages Learning to classify texts into categories Web directories

6 Slide 1 Main directions in ML Data mining Finding patterns in data Use “historical” data to make a decision Predict weather based on current conditions Self customization Automatic feedback integration Adapt to user “behaviour” Recommending systems Writing applications that cannot be programmed by hand In particular because they involve huge amounts of data Speech recognition Hand writing recognition Text understanding

7 Slide 1 Terminology Learning is performed from EXAMPLES (or INSTANCES) An example contains ATTRIBUTES or FEATURES E.g. Sky, Air Temperature, Water In concept learning, we want to learn the value of the TARGET ATTRIBUTE Classification problems. Binary case +/–  positive/negative Attributes have VALUES: A single value (e.g. Warm) ? - indicates any value possible for this attribute  - indicates that no value is acceptable. All features in an example are sometimes referred to as FEATURE VECTOR

8 Slide 1 Terminology Feature vector for our learning problem: (Sky, Air Temp, Humidity, Wind, Water, Forecast) and the target attribute is EnjoySport. How to represent Aldo enjoys sports only on cold days with high humidity (?, Cold, High, ?, ?, ?) How about Emma enjoys sports regardless of the weather ? Hypothesis = the entire set of vectors that cover given examples Most general hypothesis (?, ?, ?, ?, ?, ?) Most specific hypothesis ( , , , , ,  ) How many hypothesis can be generated for our feature vector ?

9 Slide 1 Task in machine learning Given: A set of examples X A set of hypotheses H A target concept c Determine: A hypothesis h in H such that h(x) = c(x) Practically, we want to determine those hypotheses that would best fit our examples. (Sunny, ?, ?, ?, ?, ?) Yes (?, Warm, ?, ?, ?, ?) Yes (Sunny, Warm, ?, ?, ?, ?) Yes

10 Slide 1 Machine learning applications Until now: toy example, decide if X enjoys sport given the current and future forecast Practical problems: Part of speech tagging. How? Word sense disambiguation Text categorization Chunking. Whatever problem that can be modeled through examples should support learning

11 Slide 1 Machine learning algorithms Concept learning via searching on general-specific hypotheses Decision tree learning Instance based learning Rule based learning Neural networks Bayesian learning Genetic algorithms

12 Slide 1 Basic elements of information theory How to determine which attribute is the best classifier? Measure the information gain of each attribute Entropy characterizes the (im)purity of an arbitrary collection of examples. Given a collection S of positive and negative examples Entropy(S) = - p log p – q log q Entropy is at its maximum when p = q = ½ Entropy is at its minimum when p = 1 and q = 0 Example: S contains 14 examples: 9 positive and 5 negative Entropy(S) = - (9/14) log (9/14) – (5/14) log (5/14) = 0.94 log 0 = 0

13 Slide 1 Basic elements of information theory Information gain Measures the expected reduction in entropy Many learning algorithms are making decisions based on information gain

14 Slide 1 Basic elements of information theory

15 Slide 1 Decision trees

16 Slide 1 Decision trees

17 Slide 1 Decision trees Have the capability of generating rules: IF outlook=sunny and temperature = hot THEN play tennis = no Powerful! It would be very hard to do that as a human. C4.5 (Quinlan) ID3 Integral part of MLC++ Integral part of Weka (for Java)

18 Slide 1 Instance based algorithms Distance between examples Remember the WSD algorithm? K-nearest neighbour Given a set of examples X (a1(x), a2(x) … an(x)) Classify a new instance based on the distance between current example and all examples in training

19 Slide 1 Instance based algorithms Take into account every single example: Advantage? Disadvantage? “Do not forget exceptions” Very good for NLP tasks: WSD POS tagging

20 Slide 1 Measure learning performance Error on test data Sample error (generalization error): wrong cases / total cases True error: estimate an error range starting with the sample error Cross validation schemes – for more accurate evaluations 10 fold cross validation scheme Divide training data into 10 sets Use one set for testing, and the other 9 sets for training Repeat 10 times, measure average accuracy

21 Slide 1 Practical issues – Using Weka Weka – freeware Java implementation of many learning algorithms + boosting + capability of handling very large data sets + automatic cross – validation To run an experiment: file.arff [test optional – if not present, will evaluate through cross- validation]

22 Slide 1 Specify the feature types Specify the feature types: Discrete: value drawn from a set of nominal values Continuous: numeric value Example : Golf data Play, Don't Play. | the target attribute outlook: sunny, overcast, rain.| features. temperature: real. humidity: real. windy: true, false.

23 Slide 1 Weather Data sunny, 85, 85, false, Don't Play sunny, 80, 90, true, Don't Play overcast, 83, 78, false, Play rain, 70, 96, false, Play rain, 68, 80, false, Play rain, 65, 70, true, Don't Play overcast, 64, 65, true, Play sunny, 72, 95, false, Don't Play sunny, 69, 70, false, Play rain, 75, 80, false, Play sunny, 75, 70, true, Play overcast, 72, 90, true, Play overcast, 81, 75, false, Play rain, 71, 80, true, Don't Play

24 Slide 1 Running Weka Check “Short Intro to Weka”“Short Intro to Weka”


Download ppt "Short Introduction to Machine Learning Instructor: Rada Mihalcea."

Similar presentations


Ads by Google