Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.

Slides:



Advertisements
Similar presentations
COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
Advertisements

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,
An Overview of Machine Learning
Correlation Aware Feature Selection Annalisa Barla Cesare Furlanello Giuseppe Jurman Stefano Merler Silvano Paoli Berlin – 8/10/2005.
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
Gene selection using Random Voronoi Ensembles Stefano Rovetta Department of Computer and Information Sciences, University of Genoa, Italy Francesco masulli.
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Fuzzy K means.
Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Introduction to machine learning
Analyzing Metabolomic Datasets Jack Liu Statistical Science, RTP, GSK
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Evaluating Performance for Data Mining Techniques
Multiclass object recognition
Gene expression profiling identifies molecular subtypes of gliomas
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Whole Genome Expression Analysis
Active Learning for Class Imbalance Problem
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
Presented by Tienwei Tsai July, 2005
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
Machine Learning.
1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)
Introducing the Separability Matrix for ECOC coding
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA
EB3233 Bioinformatics Introduction to Bioinformatics.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Introduction Hereditary predisposition (mutations in BRCA1 and BRCA2 genes) contribute to familial breast cancers. Eighty percent of the.
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Data Mining and Decision Support
De-anonymizing Genomic Databases Using Phenotypic Traits Humbert et al. Proceedings on Privacy Enhancing Technologies 2015 (2) :
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Clustering Algorithms Minimize distance But to Centers of Groups.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Hybrid Ant Colony Optimization-Support Vector Machine using Weighted Ranking for Feature Selection and Classification.
Experience Report: System Log Analysis for Anomaly Detection
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Basic machine learning background with Python scikit-learn
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Feature Selection Methods
Machine Learning – a Probabilistic Perspective
Low-Rank Sparse Feature Selection for Patient Similarity Learning
Presentation transcript:

Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla

Introduction Biomarkers are used to measure the progress of disease or the physiological effects of the therapeutic intervention in the treatment of disease. They are mainly used for the early warning signs for various diseases such as cancer and inflammatory diseases.

The selection and design of the features that will be considered in order to represent each example for the learning process are very important and will influence the classifier performance. Instances in any data set used by the machine learning methods are presented by the sequence of features which has each instance and the type of features. Eg: age, size

Two major learning schemes in machine learning are Unsupervised learning Supervised learning Unsupervised learning : there is no prior information is given to the learner regarding the data or the output. Clustering is the simple classical method of unsupervised learning.

Clustering methods Exclusive clustering( k-means algorithm) Overlapping clustering (fuzzy C-means algorithm) Hierarchical clustering Probabilistic

Supervised learning The instances are given with known labels its main goal is to build a classifier which makes predictions about future instances to assign their class labels.

A biomarker is a gene, protein/peptide or metabolite in a biological system used to indicate a physiological or pathological state that can be recognized or monitored. Gene expression which studies bridge gap between DNA information and trait information by dissecting biochemical pathways into intermediate components between genotype and phenotype. Biomarker – Biological Background

Genomics is divided into two basic areas as structural genomics and functional genomics Structural genomics related to the genetics Functional genomics this allows the detection of the genes that are turned on/off at any given time depending on environmental factors.

One particularly powerful application of gene expression analyses is biomarker identification which can be used for disease risk assessment, early detection, prognosis, prediction response to thearpy and preventative measures is a challenging task for cancer preventition and the improvement of treatment outcomes.

Computational Biomarker (Feature) selection Classification of samples from gene expression datasets usually involves small numbers of samples and tens of thousands of genes. There are two main categories : Filtering methods Wrapper approaches

Filtering method: each gene is examined individually. Wrapper method: correlations among the genes are taken into account and also establish the ranking among the significant genes.

Support vector machine(SVM) algorithms and ridge regression(RR) which is used for classifying the gene expression datasets and also the classification accuracy. RR performs the best comparision further demonstrating the advantages of the wrapper method over the filtering methods.

RFE for SVM which uses the “naïve” ranking on the subset of genes. The naïve ranking is the first iteration of RFE for obtaining the ranks of each gene. The SVM-RFE which is superior to SVM without RFE also uses the multivariate linear discriminant methods such as the LDA and MSD.

Wrapper method uses the gene selection and classification which compares the SVM-RCE K-means algorithm for gene clustering and the machine learning algorithm, SVM for classification and gene cluster ranking. Evaluates the contribution of each of those clusters to classification task by SVM.

Recently Grate has described a technique for discovering small sets of genes. The technique is mainly base on brute force approach of exhaustive search through all genes, gene pairs and some cases triple of genes. The classification has two methods: error- correcting output coding (ECOC) and pairwise coupling (PWC)

The biomarker pattern for distingushing each disease category from another one which is achieved by the development of an extended Markov Blanket(EMB) feature selection method. The clusters with less information are removed while retaining the remainder for the next classification step. this process is repeated until an optimal classification result is obtained.

Conclusion As the proposed method has many computational approaches which are critical for mining high dimensional data in order to effectively discover biomarkers. The best data mining approach would to integrate different approaches to arrive an effective algorithm as most suggested methods ignoring the existing biological knowledge and treating all genes equally.

Thank You