Classification Using Top Scoring Pair Based Methods Tina Gui.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

1-Way Analysis of Variance
Random Forest Predrag Radenković 3237/10
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Decision Tree Approach in Data Mining
CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches.
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Differentially expressed genes
Ensemble Learning: An Introduction
Evaluating Hypotheses
CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.
Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.
Experimental Evaluation
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
Gene expression profiling identifies molecular subtypes of gliomas
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Classification (Supervised Clustering) Naomi Altman Nov '06.
1 Classifying Lymphoma Dataset Using Multi-class Support Vector Machines INFS-795 Advanced Data Mining Prof. Domeniconi Presented by Hong Chai.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Lecture on Correlation and Regression Analyses. REVIEW - Variable A variable is a characteristic that changes or varies over time or different individuals.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
S. F. Molaeezadeh-31 may 2008Gene expression modeling through positive Boolean functions 1 Seminar Title: Gene expression modeling through positive Boolean.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Two-Way (Independent) ANOVA. PSYC 6130A, PROF. J. ELDER 2 Two-Way ANOVA “Two-Way” means groups are defined by 2 independent variables. These IVs are typically.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.
Ensemble Methods in Machine Learning
Classification Ensemble Methods 1
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba Massachusetts Institute of Technology
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
A Combinatorial Approach to the Analysis of Differential Gene Expression Data The Use of Graph Algorithms for Disease Prediction and Screening.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Article Filtering for Conflict Forecasting Benedict Lee and Cuong Than Comp 540 4/25/2006.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Classifiers!!! BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Yiming Kang, Hien-haw Liow, Ezekiel Maier, & Michael Brent
Classification with Gene Expression Data
Ensemble methods with Data Streams
Trees, bagging, boosting, and stacking
Basic machine learning background with Python scikit-learn
Combining Base Learners
Boosting Nearest-Neighbor Classifier for Character Recognition
Discriminative Frequent Pattern Analysis for Effective Classification
iSRD Spam Review Detection with Imbalanced Data Distributions
Machine Learning: Lecture 5
Presentation transcript:

Classification Using Top Scoring Pair Based Methods Tina Gui

 Introduction  Top Scoring Pair  Experiments Design  Future Work  Conclusion Outline

 Using DNA microarray technology, the limitations of current methods are 1 : 1. Small Samples 2. Lack of Interpretability  Objective: Differentiate between two classes by finding pairs of genes whose expression levels typically invert from one class to the other D. Geman, C. d'Avignon, D. Naiman and R. Winslow (2004). "Classifying gene expression profiles from pairwise mRNA comparisons". Introduction

 Rank-Based Approach  Drawback: Information is lost using this procedure  Comparison-Based Approach  In some cases, accurate prediction can be achieved by comparing the expression levels of a single pair of genes  Simple example to classifying gene expression profiles - Top Scoring Pair (TSP) Classifier Approaches

 G genes whose expression levels X = {X 1, X 2, … X G }  Each profile X has a true class label in {1, 2, ­… C}  Ex. C = 2  Marker Gene Pairs (i, j)  a significant difference in the probability of X i < X j from class 1 to class 2  profile classification is then based on the collection of distinguished pairs Top Scoring Pair

 The quantities of interest p ij (c) = P (X i < X j |c), c = 1, 2 (P, probabilities of observing X i < X j in each class)  Expression values Δ ij = |p ij (1) − p ij (2)| (Δ ij, the “Score” of (i, j). ) Top Scoring Pair

 Rank the Expression Values  Rank the scores Δ ij from largest-to-smallest  Select all pairs achieving the Top score.  Example of scoring a gene pair:  52 profiles -> class 1  50 profiles -> class 2  p ij (1) = 50/52  p ij (2) = 3/50 Top Scoring Pair

 Computing the score Notes: Since p ij (1) > p ij (2), the classifier based on this gene pair votes for class 1 for a profile with X i < X j and for class 2 otherwise Top Scoring Pair Classifier

 In some instances, the TSPs may change when the training data are perturbed by adding or deleting a few examples  K-TSP classifier uses the k top scoring disjoint gene pairs from the list  Increasing the accuracy of the TSP classifier K-TSP Classifier

 Baseline  Augmented Space  Alternate Space Experiments Design

 Raw Data Baseline TSP classifier (A 13 : A 45 ) (A 7 : A 21 ) (A 1 : A 72 ) (A 1 : A 25 ) : (A x : A y ) A1A1..A 13..A 21..A 45..AMAM M N

 Adding top ranked pairs Augment K A1..A 72 A 7_45 A 13_21 A 1_72..A a_b M + K N K-TSP classifier (A 13 : A 45 ) (A 7 : A 21 ) (A 1 : A 72 ) (A 1 : A 25 ) : : (A a : A b )

 Deal with the K-TSP columns only Alteration A 7_45 A 13_21 A 1_72..A x_y K N

 Combination of Decision Tree and Top Scoring Pairs 1 1. Czajkowski M, Krtowski M. (2011) “Top Scoring Pair Decision Tree for Gene Expression Data Analysis,” Future Work

 TSP classifier predictions are based entirely on the top-scoring pairs.  Beauty of Top Scoring Pair - Simplicity  Main Goal - Improve the classification accuracy Conclusion