Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning & Bioinformatics Tien-Hao Chang (Darby Chang) Machine Learning & Bioinformatics 1.

Similar presentations


Presentation on theme: "Machine Learning & Bioinformatics Tien-Hao Chang (Darby Chang) Machine Learning & Bioinformatics 1."— Presentation transcript:

1 Machine Learning & Bioinformatics Tien-Hao Chang (Darby Chang) Machine Learning & Bioinformatics 1

2 What is machine learning? 2 Machine Learning & Bioinformatics

3 K-Nearest-Neighbors (KNN) A very trivial machine learning tool The predicted class of the query sample depends on the voting among its k nearest neighbors 3 O X X O O X O ? X X O O X X O Machine Learning & Bioinformatics

4 4 When k = 3 O X X O O X O O X X O Machine Learning & Bioinformatics

5 5 When k = 5 O X X O O X O X X X O O X X O Machine Learning & Bioinformatics

6 Although KNN is very trivial, it can Example: in vitro fertilization –Given: embryos described by 60 features –Problem: selection of embryos that will survive –Data: historical records of embryos and outcome Given a set of known instances Predict outcome for newly coming instances So, KNN learnt something related to the definition of embryo goodness 6 Machine Learning & Bioinformatics

7 7 Can machines really learn? Notice that here we call KNN a machine Definitions of learning from dictionary: –To get knowledge of by study, experience, or being taught –To become aware by information or from observation –To commit to memory –To be informed of, ascertain; to receive instruction Operational definition: –Things learn when they change their behavior in a way that makes them perform better in the future Difficult to measure Trivial for computers Does a slipper learn? Machine Learning & Bioinformatics

8 8 Shortly speaking, machine learning is Machine E.g. KNN Training data A set of known instances Testing data A query instance Outcome Class of the query instance Knowledge/ Information Machine Learning & Bioinformatics

9 9 Furthermore, learning is Machine E.g. KNN Training data A set of known instances Testing data A query instance Outcome Class of the query instance Knowledge/ Information When training data increases It delivers better (e.g. higher accuracy) outcome Machine Learning & Bioinformatics

10 10 Usually, we dont invent the wheel Machine E.g. KNN Training data A set of known instances Testing data A query instance Outcome Class of the query instance Knowledge/ Information Convert data (e.g. embryos) to vector is not trivial Machine Learning & Bioinformatics

11 Feature 11 Machine Learning & Bioinformatics

12 Data representation Format (for LIBSVM) –1 1:-0.555556 2:0.25 3:-0.864407 4:-0.916667 –though this is for LIBSVM, a famous implementation of support vector machine (SVM), all other machine learning tools share the same concept Label is also the answer or class of an sample Feature is also called features or feature vector 12 LabelFeature Machine Learning & Bioinformatics

13 Label and feature Label is defined by the experts –usually biologists in bioinformatics Data representation is also called feature encoding or feature extraction –you may not know which feature is important –you may not have the key feature –you need to know the domain knowledge to design good features –if you dont design new algorithms (most researchers dont), the only thing you can do is to design new features 13 Machine Learning & Bioinformatics

14 Evaluation 14 Machine Learning & Bioinformatics

15 Evaluation issues Recall that in KNN algorithm, the predicted classes of query samples require comparing the query samples to a collection of reference samples whose classes are known This collection is called training set and these reference samples are called training samples When evaluating, we need to know the classes of the query samples so that we can compare the answers and the predictions These query samples with known classes are called testing set or testing samples 15 Machine Learning & Bioinformatics

16 The answer of query is needless theatrically 16 Actually, it should not exist or we dont need to predict. However, we always need to evaluate our methods/features, and thus we always have the answer of the testing set in this course. Machine Learning & Bioinformatics

17 Sample arrangement How to split n samples, whose classes are known, into training and testing sets? Its getting worse if the algorithm has parameters –is KNN a method? –are 3NN and 5NN different methods? 17 Machine Learning & Bioinformatics

18 Todays exercise Machine Learning & Bioinformatics 18

19 Single-class prediction Machine Learning & Bioinformatics 19 Design your own select, feature, buy and sell programs. Upload and test them in our simulation system. Finally, commit your best version before 23:59 10/1 (Mon).simulation system


Download ppt "Machine Learning & Bioinformatics Tien-Hao Chang (Darby Chang) Machine Learning & Bioinformatics 1."

Similar presentations


Ads by Google