Presentation is loading. Please wait.

Presentation is loading. Please wait.

+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)

Similar presentations


Presentation on theme: "+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)"— Presentation transcript:

1 + Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)

2 + Kernel Trick https://www.youtube.com/watch?v=3liCbRZPrZA

3 + This is achieved with a polynomial kernel Feature map: Kernel:

4 + Optimization of transformed problem: Only kernel matters Dual Lagrangian for transformed problem: Optimal weight vector: Thus, optimal hyperplane:

5 + Kernel Trick We can choose the kernel without first defining a feature map. How to get a feature map from a kernel? Define i.e. map vectors in the original feature space to functions. Inner product on transformed space:

6 + Get rich off of support vectors

7 + Making 5-day forecasts of financial futures Given data on the returns for 5 days Predict the return on the next day To achieve this, we need to figure out which 5-day stretches tend to predict good returns on the 6 th day, and which predict not-so-good returns A training data set is used for this purpose

8 + Making 5-day forecasts of financial futures Day 1Day 2Day 3Day 4Day 5 x 11 x 12 x 13 x 14 x 15 x 21 x 22 x 23 x 24 x 25 x 31 x 32 x 33 x 34 x 35 x 41 x 42 x 43 x 44 x 45 …………… Day 6 y1y1 y2y2 y3y3 y4y4 y5y5 5-dimensional feature spaceReturn on 6 th day is classifier for data Routine learns how to classify 5-day-return data points by working with a training data set for 500 days. Constructs a dividing hypersurface and uses it to decide what the 6 th -day return should be for new data points.

9 + Good results – you can try it yourself! Complete with R code: http://www.r-bloggers.com/trading- with-support-vector-machines-svm/http://www.r-bloggers.com/trading- with-support-vector-machines-svm/

10 + Another example: gene expression in normal and cancerous tissue Gene = unit of heredity Human genome contains about 21,000 genes Public domain image from Wikipedia

11 + Another example: gene expression in normal and cancerous tissue DNA transcribes to RNA which translates to proteins This is the process whereby the “genetic code” is made manifest as biological characteristics (genotype gives rise to phenotype) Wikimedia Commons image by Madeleine Price Ball

12 + Big question: Which genes are responsible for which outcomes? In various tissues (e.g. tumor versus normal), which genes are active, hyperactive, and silent?  Can use DNA microarrays to measure gene expression levels.

13 + DNA Microarray https://www.youtube.com/watch?v=_6ZMEZK-alM Source: National Human Genome Research Institute

14 + Using support vector machines to determine which genes are important for cancer classification

15 + Data Data points: Patients Features: Gene expression coefficients (activity level of a given gene) Feature space will have a huge number of dimensions! Need a way to reduce. Could examine all possible subspaces of feature space, but note that if dimension (N) of feature space represents thousands of genes, will mean that number of n-dimensional subspaces is  Too large for practical examination of each subspace

16 + Generate ranking of features A ranking of features allows us to make a nested sequence of subspaces of feature space F and then determine the optimum subspace to work with One possibility for ranking: Work with each gene individually, get its correlation coefficient with the classifier (i.e. find correlation of gene expression level with classification of tissue into tumor v. normal or into two different types of cancer Note: ranking by correlation coefficient assumes all the features are independent of one another.

17 + Generate ranking of features Another possible way to generate a ranking of features: sensitivity analysis. Have training data set, already classified into two classes (cancerous v. non, or cancer type 1 v. cancer type 2) Construct a cost function to estimate error in classification Sensitivity of cost function to removal of a feature measures the importance of that feature and allows the construction of a ranking.

18 + Ranking by Support Vector Machines Recursive Feature Elimination Idea of how to use SVM to identify important features: Consider a cartoon scenario. x1x1 x2x2  Indicates that the x 1 direction is completely superfluous for classification.

19 + Ranking by Support Vector Machines This suggests the following recursive algorithm for ranking features: Find weight vector, using all features Identify the least important feature to be the one with the smallest (in absolute value) component of the weight vector List that feature as least important and eliminate itfrom the data Iterate the procedure, with the least important feature thrown out. End result: Ranked list of features!

20 + Try this at home! Data is available online! http://www.broadinstitute.org/software/cprg/?q=node/55 Classify two types of leukemia.


Download ppt "+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)"

Similar presentations


Ads by Google