(1) Risk prediction by kernels and (2) Ranking SNPs Usman Roshan.

Slides:



Advertisements
Similar presentations
Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants Journal club (Nov/13) SH Lee.
Advertisements

Why I chose: First reading results seemed counterintuitive Introduction full of references I didn’t know Useful? Or Gee Whizz so what?...Needed to read.
INTRODUCTION TO Machine Learning 2nd Edition
GPU and machine learning solutions for comparative genomics Usman Roshan Department of Computer Science New Jersey Institute of Technology.
1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.
Presented by Relja Arandjelović The Power of Comparative Reasoning University of Oxford 29 th November 2011 Jay Yagnik, Dennis Strelow, David Ross, Ruei-sung.
Regression Usman Roshan CS 675 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
Chapter 17 Overview of Multivariate Analysis Methods
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Classification and risk prediction
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Genome-wide association studies Usman Roshan. SNP Single nucleotide polymorphism Specific position and specific chromosome.
Classification and risk prediction Usman Roshan. Disease risk prediction What is the best method to predict disease risk? –We looked at the maximum likelihood.
Reduced Support Vector Machine
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Single nucleotide polymorphisms Usman Roshan. SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least.
Using biological networks to search for interacting loci in genome-wide association studies Mathieu Emily et. al. European journal of human genetics, e-pub.
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability Software complexity and software quality.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
BNFO 602 Lecture 2 Usman Roshan. Bioinformatics problems Sequence alignment: oldest and still actively studied Genome-wide association studies: new problem,
Genome-Wide Association Studies
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Usman Roshan Machine Learning, CS 698
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Machine Learning Tutorial Amit Gruber The Hebrew University of Jerusalem.
Practical Statistics Regression. There are six statistics that will answer 90% of all questions! 1. Descriptive 2. Chi-square 3. Z-tests 4. Comparison.
Regression Usman Roshan CS 698 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
Chapter 16 Data Analysis: Testing for Associations.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
Chapter 9 Correlational Research Designs. Correlation Acceptable terminology for the pattern of data in a correlation: *Correlation between variables.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Regression Usman Roshan CS 675 Machine Learning. Regression Same problem as classification except that the target variable y i is continuous. Popular.
Feature Selection and Extraction Michael J. Watts
Geology 5670/6670 Inverse Theory 28 Jan 2015 © A.R. Lowry 2015 Read for Fri 30 Jan: Menke Ch 4 (69-88) Last time: Ordinary Least Squares: Uncertainty The.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Genome-wide association studies
Support Vector Machines Optimization objective Machine Learning.
Sequence Kernel Association Tests (SKAT) for the Combined Effect of Rare and Common Variants 統計論文 奈良原.
Brendan Burke and Kyle Steffen. Important New Tool in Genomic Medicine GWAS is used to estimate disease risk and test SNPs( the most common type of genetic.
BNFO 615 Usman Roshan. Projects and papers An opportunity to do hands on work Proposal presentations due by end of September Papers: present at least.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
Regression Usman Roshan.
Support vector machines
Lecture 10 Regression Analysis
Disease risk prediction
Using Linear Models Objectives: To write and use linear equations that model real world situations. Essential Understanding: Many real world situations.
Usman Roshan Machine Learning
Regularized risk minimization
Feature selection Usman Roshan.
Beyond GWAS Erik Fransen.
Graph 2 Graph 4 Graph 3 Graph 1
Usman Roshan Machine Learning
Regression Usman Roshan.
Support vector machines
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Kernel Methods for large-scale Genomics Data Analysis
Structural Equation Modeling
Presentation transcript:

(1) Risk prediction by kernels and (2) Ranking SNPs Usman Roshan

Disease risk prediction Can we better predict disease risk with non-linear kernels? What value of the regularization parameter C should one pick?

Experimental design WTCCC type 1 diabetes GWAS Select 90% of case and controls as training and remaining as test Learn SVM on training and predict on test

Type 1 diabetes (linear kernel)

Type 1 diabetes (polynomial degree 2 kernel)

Type 1 diabetes (both kernels)

Kernels Are there kernels that can predict risk better than the linear model? –Previous kernels in bioinformatics for sequence data Can we combine kernels or automatically learn kernels? –Multiple kernel learning

Ranking SNPs Chi-square Similar to other univariate statistics –"Ranking SNPs by different tests""Ranking SNPs by different tests" Can we rank with multivariate methods? Rank SNPs for population structure using PCA –PCA correlated SNPsPCA correlated SNPs Can we rank with the SVM discriminant? To answer this we have to do simulation.

Ranking SNPs Experimental design: –Simulate GWAS with known causal SNPs –Rank SNPs in simulated data and examine number of causal variants in top ranked SNPs –Bonferroni correction: 0.05/(number of SNPs)

Rank of causal SNPs

Rank of causal SNPs (real data)

Risk prediction on simulated data