Download presentation
Presentation is loading. Please wait.
1
Networks of Protein Interactions Construction of Networks from Diverse Data Sources Neda Nategh CS 374 Lecture 16 November 7, 2006
2
What we have learned about interaction networks in CS374 Properties of interaction networks (Susan) Comparison of networks across species (Chuan Sheng) Network alignment Construction of networks from diverse data sources (Neda) Network integration
3
Biological aspects Basics of protein Interaction Networks
4
Types of interactions Physical interactions Protein pairs are in direct contact Complex interaction Protein pairs participate in the same functional module Metabolic pathway Signaling network Multiprotein complex Eukaryote-like glycosylation system of Campylobacter jejuni Cell division machinery of Caulobacter crescentus
5
Protein Complex A protein complex is a group of two or more associated proteins. Networks of proteins Topological properties Functional organization news.uns.purdue.edu/UNS/images/cramer.photo2.jpeg
6
Metabolic Pathway Metabolic pathway is a series of chemical reactions occurring within a cell catalyzed by enzymes formation of a metabolic product initiation of another metabolic pathway Metabolic Networks http://en.wikipedia.org/wiki/Metabolic_pathway
7
Signaling network Signal transduction Process by which a cell converts one kind of signal or stimulus into another. A sequence of biochemical reactions inside the cell, which are carried out by enzymes and linked through second messengers. http://en.wikipedia.org/wiki/Signal_transduction
8
High-throughput data Co-expression Co-location Co-inheritance Co-evolution Co-citation Rosetta stone(Gene-fusions)
9
Expression Gene expression, or simply expression, is the process by which a gene's DNA sequence is converted into the structures and functions of a cell. Indirectly, the expression of particular genes may be assessed with DNA microarray technology, which can provide a rough measure of the cellular concentration of different messenger RNAs; http://en.wikipedia.org/wiki/DNA_microarray
10
Inheritance Proteins are clustered according to the similarity of their phylogenetic profiles. Similar profiles show a correlated pattern of inheritance and, by implication, functional linkage.
11
Evolution Evolution is the change in the heritable traits of a population over successive generations, as determined by shifts in the allele frequencies of genes.
12
Gene fusion A fusion gene is a hybrid gene formed from two previously separate genes. translocation interstitial deletion chromosomal inversion By creating a fusion gene of a protein of interest and green fluorescent protein, the protein of interest may be observed in cells or tissue using fluorescence microscopy. The protein synthesized when a fusion gene is expressed is called a fusion protein. http://en.wikipedia.org/wiki/Gene_fusion
13
Experiments Microarray analysis of gene expression Systematic protein interaction mapping Mass spectrometry Yeast two hybrid Synthetic lethal screens
14
Microarray analysis of gene expression DNA microarray or gene/genome chip, DNA chip, or gene array Collection of microscopic DNA spots attached to a solid surface, such as glass, plastic or silicon chip forming an array for the purpose of expression profiling, monitoring expression levels for thousands of genes simultaneously. Applications: Identification of sequence Determination of expression level of genes http://phys.chem.ntnu.no/~bka/images/MicroArrays.jpg
15
Affinity purification/Mass spectrometry For characterization of proteins Using quantitative mass spectrometry to analyze the composition of a partially purified protein complex. Interacting proteins can be distinguished from nonspecifically co-purifying proteins by their abundance ratios. Complexes can be analyzed after a single step purification Better detection of weakly associated proteins http://en.wikipedia.org/wiki/Image:Mass_spectrom.gif
16
Yeast Two Hybrid Two-hybrid screening is a molecular biology technique used to discover protein-protein interactions by testing for physical interactions (such as binding) between two proteins. Susan Tang’s presentation, CS374 algorithms in biology, Stanford University
17
Synthetic lethal screening To interpret genetic networks by examining the effects on the cell when pairs of genes are knocked out simultaneously. Knocking out each gene separately may have no phenotypic effect because of robustness provided by genetic redundancy, but knocking out both genes has a severe, possibly lethal effect.
18
Basics of protein Interaction Networks Computational aspects
19
Statistics terminology Probability Probability density Conditional probability Prior/Postrior probability Bayes’ rule
20
Statistics terminology TrueFalse PositiveTrue positiveFalse positive (error type I) NegativeTrue negativeFalse negative (error type II)
21
Graph theory We map interaction networks to graphs Vertex (node) Edge Cycle -5 Directed Edge (Arc) Weighted Edge 7 10
22
Networks in our model Undirected graphs Nodes correspond to proteins Edges represent the interactions Edge weights represent interaction probabilities
23
Network Clustering 7000 Yeast interactions among 3000 proteins
24
Training sets KEGG(Kyoto Encyclopedia of Genes and Genomes) GFP(Green Fluorescent Protein) GO(Gene Ontology) COG(Cluster of Orthologous Groups of proteins)
25
Genomics 1 genome Assembly, Gene Finding Comparative Genomics N genomes Sequence Alignment Functional Genomics 1 assay Microarray Analysis Integrative Genomics N assays Network Integration
26
A probabilistic functional network of yeast genes Insuk Lee, Shailesh V. Date, Alex T. Adai,Edward M. Marcotte
27
Motivation Knowledge of gene networks’ structure Complex roles of individual genes interplay between many systems in a cell
28
Problem Heterogeneous functional genomics data Microarray analyses of gene expression Systematic protein interaction mapping measure different aspects of gene or protein associations Mass spectrometry measure the tendency for proteins to be components of the same physical module Yeast two-hybrid assays indicate direct physical interaction(stable or transient) between proteins Synthetic lethal screens measure the tendency for genes to compensate for the loss of other genes
29
Idea of integration Constructing a more accurate and extensive gene network Considering functional rather than physical associations genetic biochemical computational probabilistic gene-gene linkages Single coherent network
30
Scoring scheme Based on a Bayesian statistics approach Log-likelihood score Frequencies of linkages (L) observed in the given experiment (E) between annotated genes operating in the same pathway is P(L|E) different pathways is ~P(L|E) Total frequency of linkages between all annotated yeast genes operating in the same pathway is P(L) different pathways is ~P(L)
31
Scoring scheme LLS > 0 Experiments tend to link genes in the same pathway Higher scores More confident linkages proportional to the accuracy of the experiments Different experiments’ scores are directly comparable
32
Data sources
33
Benchmarked accuracy and extent of functional genomics data sets and the integrated networks
34
Results Evidence from diverse sources Estimating the functional coupling between yeast genes A view of relations between yeast proteins distinct from their physical interactions Probabilistic gene network
35
Future directions Application of this strategy to other organisms such as human (i) assemble benchmarks for measuring the accuracy of linkages between human genes (ii) assemble gold standard sets of highly accurate interactions for calibrating the benchmarks (iii) benchmark functional genomics data for their ability to correctly link human genes then integrate the data as described.
36
Integrated protein interaction networks for 11 Microbes Balaji S. Srinivasan, Antal F. Novak, Jason A. Flannick, Serafim Batzoglou, Harley H. McAdams
37
Motivation There are different methods to predict the interactions but the network generated by eah method are often contradictory Objective: constructing a summary network for each species which uses all the evidence at hand to predict which proteins are functionally linked
38
Data source Co-expression 1.81 1 -.6 -.7 Gene A Gene B Gene C Gene B Gene A Gene C Pearson Correlation =.8 -.7 -.6 Genes Arrays Microarray data Expression Balaji S. Srinivasan
39
Data sources Co-location 0.060 0.25 Protein A Protein B Protein C Protein B Protein A Protein C Average chromosomal distance.06.25 =.6.2.3.1.5.1.3.2.4.25.05 Protein A Protein B Protein C Chrom 2Chrom 1Chrom 4 Chrom 3 Location Balaji S. Srinivasan Assembled Genomes
40
Data sources Co-evolution 1.91 1 -.7 -.8 Prt Fam A Prt Fam B Prt Fam C Prt Fam B Prt Fam A Prt Fam C Tree Distances.9 -.8 -.7 = C’’ Evolution AA’A’’A’’’ B’B’’B’’’ B C’C’’’ C Multiple Alignments Balaji S. Srinivasan
41
Data sources Co-inheritance 1.951 1 - -1 Protein A Protein B Protein C Protein B Protein A Protein C.95 -.95 = Spearman Correlation 600200300100 500 100300200400 250 50 Protein A Protein B Protein C Species 2Species 1Species 4 Species 3 Inheritance Balaji S. Srinivasan BLAST bit scores
42
Integration of two predictors Previous work Recent work Method presented in this paper
43
Previous work We can integrate two given networks by intersection union average + coexpression coinheritance =
44
Recent work Bayesian Networks (Troyanskaya 2003) Decision Trees (Wong 2004) Naïve Bayes + Boosting (Lu 2005)Likelihood Ratios (Lee 2004)
45
Training sets COG GO KEGG From up COG to GO to KEGG Fraction of annotated proteins in a given organism decreases Annotation quality is increases
46
1D Bayes’ rule Bayes’ Rule: Calculate conditional probability of linkage given evidence Balaji S. Srinivasan
47
ID Bayes’ rule Bayes error rate= min. error rate of classifier A B L=? E known Different Function A B L=0 Same Function A B L=1 P(L|E) Balaji S. Srinivasan
48
2D network integration 2D scatter plot Separates linked pairs from unlinked pairs more efficiently co-expression vs. co-inheritence
49
2D network integration Estimate densities Kernel density estimation Gray-Moore dual tree algorithm
50
2D network integration
51
Posterior probability of interactions P(L=1|E) visual, geometric interpretation Balaji S. Srinivasan likely to interact may interact unlikely to interact
52
2D network integration Joint density reveals hidden biology moderate evidence from multiple sources is better than strong evidence from one source subtle interactions missed using one evidence 30-60% of interaction data falls in this region! Balaji S. Srinivasan
53
N predictors Given all evidences N=3 coexpression (E1) colocation (E2) coinheritance (E3)
54
Conclusion and Future directions This algorithm can be generalized to apply to discrete,ordinal or categorical data sets and is applicable to larger eukaryotic genomes Possibility of comparative modular biology Align subgraphs of interaction networks Network alignment algorithm scalable to large data sets and comparing many species simultaneously.
55
Question ?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.