Presentation is loading. Please wait.

Presentation is loading. Please wait.

What is machine learning? 1. A very trivial machine learning tool K-Nearest-Neighbors (KNN) The predicted class of the query sample depends on the voting.

Similar presentations


Presentation on theme: "What is machine learning? 1. A very trivial machine learning tool K-Nearest-Neighbors (KNN) The predicted class of the query sample depends on the voting."— Presentation transcript:

1 What is machine learning? 1

2 A very trivial machine learning tool K-Nearest-Neighbors (KNN) The predicted class of the query sample depends on the voting among its k nearest neighbors 2 O X X O O X O ? X X O O X X O

3 3 When k = 3 O X X O O X O O X X O

4 4 When k = 5 O X X O O X O X X X O O X X O

5 Although KNN is very trivial, it can Example: in vitro fertilization –Given: embryos described by 60 features –Problem: selection of embryos that will survive –Data: historical records of embryos and outcome Given a set of known instances Predict outcome for newly coming instances So, KNN learnt something related to “the definition of embryo goodness” 5

6 6 Can machines really learn? Notice that here we call KNN a machine Definitions of “learning” from dictionary: –To get knowledge of by study, experience, or being taught –To become aware by information or from observation –To commit to memory –To be informed of, ascertain; to receive instruction Operational definition: –Things learn when they change their behavior in a way that makes them perform better in the future Difficult to measure Trivial for computers Does a slipper learn?

7 7 Shortly speaking, machine learning is Machine E.g. KNN Training data A set of known instances Testing data A query instance Outcome Class of the query instance Knowledge/ Information

8 8 Furthermore, learning is Machine E.g. KNN Training data A set of known instances Testing data A query instance Outcome Class of the query instance Knowledge/ Information When training data increases It delivers better (e.g. higher accuracy) outcome

9 9 Usually, we don’t invent the wheel Machine E.g. KNN Training data A set of known instances Testing data A query instance Outcome Class of the query instance Knowledge/ Information Convert data (e.g. embryos) to vector is not trivial

10 Feature 10

11 Data representation Format (for LIBSVM) –1 1:-0.555556 2:0.25 3:-0.864407 4:-0.916667 –though this is for LIBSVM, a famous implementation of support vector machine (SVM), all other machine learning tools share the same concept Label is also the answer or class of an sample Feature is also called features or feature vector 11 LabelFeature

12 Label and feature Label is defined by the experts –usually biologists in bioinformatics Data representation is also called feature encoding or feature extraction –you may not know which feature is important –you may not have the key feature –you need to know the domain knowledge to design good features –if you don’t design new algorithms (most researchers don’t), the only thing you can do is to design new features 12

13 Evaluation 13

14 Evaluation issues Recall that in KNN algorithm, the predicted classes of query samples require comparing the query samples to a collection of reference samples whose classes are known This collection is called training set and these reference samples are called training samples When evaluating, we need to know the classes of the query samples so that we can compare the answers and the predictions These query samples with known classes are called testing set or testing samples 14

15 The answer of query is needless theatrically 15 Actually, it should not exist or we don’t need to predict. However, we always need to evaluate our methods/features, and thus we always have the answer of the testing set in this course.

16 Sample arrangement How to split n samples whose classes are known into training and testing sets? It’s getting worse if the algorithm has parameters –is KNN a method? –are 3NN and 5NN different methods? 16


Download ppt "What is machine learning? 1. A very trivial machine learning tool K-Nearest-Neighbors (KNN) The predicted class of the query sample depends on the voting."

Similar presentations


Ads by Google