Label propagation algorithm

Slides:



Advertisements
Similar presentations
Chapter Outline 3.1 Introduction
Advertisements

Regularization David Kauchak CS 451 – Fall 2013.
SVM—Support Vector Machines
Computer vision: models, learning and inference Chapter 8 Regression.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Data mining and statistical learning - lecture 6
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Ordinary least squares regression (OLS)
Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
1 Relationships We have examined how to measure relationships between two categorical variables (chi-square) one categorical variable and one measurement.
Classification and Prediction: Regression Analysis
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
LECTURE 2 Splicing graphs / Annoteted transcript expression estimation.
Regression Maarten Buis Outline Recap Estimation Goodness of Fit Goodness of Fit versus Effect Size transformation of variables and effect.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Chapter 31.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.
ADALINE (ADAptive LInear NEuron) Network and
CpSc 881: Machine Learning
CHAP 3 WEIGHTED RESIDUAL AND ENERGY METHOD FOR 1D PROBLEMS
GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley,
Machine Learning 5. Parametric Methods.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
Review of statistical modeling and probability theory Alan Moses ML4bio.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
1 CHAP 3 WEIGHTED RESIDUAL AND ENERGY METHOD FOR 1D PROBLEMS FINITE ELEMENT ANALYSIS AND DESIGN Nam-Ho Kim.
Journal club Jun , Zhen.
Finding Dense and Connected Subgraphs in Dual Networks
Networks and Interactions
Semi-Supervised Clustering
Chapter 7. Classification and Prediction
Objective Numerical methods SIMPLE CFD Algorithm Define Relaxation
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Constrained Clustering -Semi Supervised Clustering-
Figure S2 A B Log2 Fold Change (+/- cAMP) Transcriptome (9hr)
第 3 章 神经网络.
CSE 4705 Artificial Intelligence
Analysis of bio-molecular networks through RANKS (RAnking of Nodes
Boosting and Additive Trees
Classification with Perceptrons Reading:
Roberto Battiti, Mauro Brunato
Roberto Battiti, Mauro Brunato
Learning with information of features
Ying shen Sse, tongji university Sep. 2016
Linear regression Fitting a straight line to observations.
Chapter 31.
Least Squares Method: the Meaning of r2
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Linear Model Selection and regularization
Anastasia Baryshnikova  Cell Systems 
Volume 23, Issue 10, Pages (October 2016)
Principle of Epistasis Analysis
Multivariate Methods Berlin Chen
Avoid Overfitting in Classification
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Metacognitive Failure as a Feature of Those Holding Radical Beliefs
Presentation transcript:

Label propagation algorithm Chun-Chi Liu (Jim) ( 劉俊吉 )  Institute of Genomics and Bioinformatics  National Chung Hsing University E-mail: jimliu@nchu.edu.tw http://syslab.nchu.edu.tw/ Title: Systems Biology and Gene Function Study Abstract: Systems biology usually integrates various biological systems such as genomics, transcriptomics, proteomics, gene regulatory networks, and pathways. Bioinformatics provides the integrating tools to perform the systematic data analysis and construct biological models. In this talk, I will introduce several computational systems biology methods, which can be used in gene function study. 1 1

Gene function prediction Network construction for gene function prediction Co-expression: Gene expression data. Two genes are linked if their expression levels are similar across conditions in a gene expression study. Physical Interaction: Protein-protein interaction data. Genetic interaction: Genetic interaction data. Two genes are functionally associated if the effects of perturbing one gene were found to be modified by perturbations to a second gene. Shared protein domains: Protein domain data. Two gene products are linked if they have the same protein domain. Co-localization: Genes expressed in the same tissue, or proteins found in the same location. Pathway: Pathway data. Two gene products are linked if they participate in the same reaction within a pathway. 2 2

GeneMANIA algorithm GeneMANIA stands for Multiple Association Network Integration Algorithm (Mostafavi et al. Genome Biology 2008) The GeneMANIA algorithm consists of two parts: A linear regression-based algorithm that calculates a single composite functional association network from multiple data sources. A label propagation algorithm for predicting gene function given the composite functional association network. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Mostafavi S, Ray D, Warde-Farley D, Grouios C, Morris Q (2008) Genome Biology 9: S4.  GeneMANIA treats gene function prediction as a binary classification problem. As such, each functional association network derived from the data sources is assigned a positive weight, reflecting the data sources' usefulness in predicting the function. The weighted average of the association networks is constructed into a function-specific association network. GeneMANIA uses separate objective functions to fit the weights; this simplifies the optimization problem and decreases the run time. GeneMANIA predicts gene function from the composite network using a variation of the Gaussian field label propagation algorithm that is appropriate for gene function prediction in which there are typically relatively few positive examples. Label propagation algorithms assign a score (the discriminant value) to each node in the network. This score reflects the computed strength of association that the node has to the seed list defining the given function. This value can be thresholded to enable predictions of a given gene function. 3 3 3 3

Network combination by using ridge regression where α is the vector of network weighs. t is the target vector. (Mostafavi et al. Genome Biology 2008) 4 4 4

Example ~ + W ~ W1 + W2 + W3 = + W = 0.6 W1 + 0.4 W2 + 0.4 W3 1 0.8 5 5

Positive and negative label Positive label Negative label 6

Reference: Jieping Ye, Arizona State University Linear regression 40 26 24 Temperature 20 22 20 30 40 20 30 10 20 10 20 10 Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),'k','filled'); a=axis; a(3)=0; axis(a); Given examples Predict given a new point Reference: Jieping Ye, Arizona State University 7

Reference: Jieping Ye, Arizona State University Linear regression 10 20 30 40 22 24 26 40 Temperature 20 20 Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),'k','filled'); a=axis; a(3)=0; axis(a); Prediction Prediction Reference: Jieping Ye, Arizona State University 8

Ordinary Least Squares (OLS) Error or “residual” Observation Prediction Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),'k','filled'); a=axis; a(3)=0; axis(a); 20 Sum squared error Reference: Jieping Ye, Arizona State University 9

Minimize the sum squared error Linear equation Linear system Reference: Jieping Ye, Arizona State University

Ridge Regression Ridge regression shrinks the regression coefficients by imposing a penalty on their size. Thus, ridge coefficients minimize a penalized RSS. Larger λ means greater shrinkage. Alternatively: There is one-to-one correspondence between s and λ. Lambda >=0 controls the amount of shrinkage. The larger the lambda, the greater the shrinkage. 11

The solution (Mostafavi et al. Genome Biology 2008) 12 12

Trace (linear algebra) In linear algebra, the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right) of A, i.e., http://en.wikipedia.org/wiki/Trace_(linear_algebra) 13 13

Label propagation algorithm y is the training label vector f is the predicted label wij is the association between gene i and j L = D – W is the graph Laplacian matrix (Mostafavi et al. Genome Biology 2008) 14 14

fi i j

Solving a sparse linear system (Mostafavi et al. Genome Biology 2008) 16 16

http://en.wikipedia.org/wiki/Product_rule

Laplacian matrix http://en.wikipedia.org/wiki/Laplacian_matrix 18 18

The comparison of 9 methods (Peña-Castillo et al. Genome Biology 2008)

GeneMANIA performance (C) (Peña-Castillo et al. Genome Biology 2008) 20

Thank you!