Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene and Protein Networks Monday, April 10 2006 CSCI 7000-005: Computational Genomics Debra Goldberg

Similar presentations


Presentation on theme: "Gene and Protein Networks Monday, April 10 2006 CSCI 7000-005: Computational Genomics Debra Goldberg"— Presentation transcript:

1 Gene and Protein Networks Monday, April 10 2006 CSCI 7000-005: Computational Genomics Debra Goldberg debg@hms.harvard.edu

2 What is a network? A collection of objects (nodes, vertices) Binary relationships (edges) May be directed Also called a graph

3 Networks are everywhere

4 Social networks from www.liberality.org Nodes: People Edges: Friendship

5 Sexual networks Nodes: People Edges: Romantic and sexual relations

6 Transportation networks Nodes: Locations Edges: Roads

7 Power grids Nodes: Power station Edges: High voltage transmission line

8 Airline routes Nodes: Airports Edges: Flights

9 Internet Nodes: MBone Routers Edges: Physical connection

10 Internet Nodes: Autonomous systems Edges: Physical connection

11 World-Wide-Web Nodes: Web documents Edges: Hyperlinks

12 Gene and protein networks

13 Metabolic networks Nodes: Metabolites Edges: Biochemical reaction (enzyme) from web.indstate.edu

14 Metabolic networks Drug targets predicted Nodes: Metabolites Edges: Biochemical reaction (enzyme) from www.bact.wisc.edu

15 Metabolic networks Nodes: Metabolites Edges: Biochemical reaction (enzyme)

16 Protein interaction networks Gene function predicted from www.embl.de Nodes: Proteins Edges: Observed interaction

17 Gene regulatory networks Inferred from error-prone gene expression data from Wyrick et al. 2002 Nodes: Genes or gene products Edges: Regulation of expression

18 Signaling networks Nodes: Molecules ( e.g., Proteins or Neurotransmitters) Edges: Activation or Deactivation from pharyngula.org

19 Signaling networks Nodes: Molecules (e.g., Proteins or Neurotransmitters) Edges: Activation or Deactivation from www.life.uiuc.edu

20 Synthetic sick or lethal (SSL) Cells live (wild type) Cells live Cells die or grow slowly X Y X Y X Y X Y

21 SSL networks Gene function, drug targets predicted Nodes: Nonessential genes Edges: Genes co-lethal from Tong et al. 2001 X Y

22 Other biological networks Coexpression –Nodes: genes –Edges: transcribed at same times, conditions Gene knockout / knockdown –Nodes: genes –Edges: similar phenotype (defects) when suppressed

23 What they really look like…

24 We need models!

25 Traditional graph modeling RandomRegular from GD2002

26 Introduce small-world networks

27 Small-world Networks Six degrees of separation 100 – 1000 friends each Six steps: 10 12 - 10 18 But… We live in communities

28 Small-world measures Typical separation between two vertices –Measured by characteristic path length Cliquishness of a typical neighborhood –Measured by clustering coefficient v C v = 1.00 v C v = 0.33

29 Watts-Strogatz small-world model

30 Measures of the W-S model Path length drops faster than cliquishness Wide range of p has both small-world properties

31 Small-world measures of various graph types Cliquishness Characteristic Path Length Regular graph HighLong Random graph LowShort Small-world graph HighShort

32 Another network property: Degree distribution P (k) The degree (notation: k ) of a node is the number of its neighbors The degree distribution is a histogram showing the frequency of nodes having each degree

33 Degree distribution of E-R random networks Binomial degree distribution, well-approximated by a Poisson Degree = k P( k) Erdös-Rényi random graphs Network figures from Strogatz, Nature 2001

34 Degree distribution of many real-world networks Scale-free networks Degree distribution follows a power law P (k = x) =  x -  Degree = k P( k) log k log P( k)

35 Hierarchical Networks Ravasz, et al., Science 2002

36 3. Scaling clustering coefficient (DGM) 2. Clustering coefficient independent of N Properties of hierarchical networks 1. Scale-free

37 C of 43 metabolic networks Independent of N Ravasz, et al., Science 2002

38 Scaling of the clustering coefficient C(k) Metabolic networks Ravasz, et al., Science 2002

39 Many real-world networks are small-world, scale-free World-wide-web Collaboration of film actors (Kevin Bacon) Mathematical collaborations (Erdös number) Power grid of US Syntactic networks of English Neural network of C. elegans Metabolic networks Protein-protein interaction networks

40

41 There is information in a gene’s position in the network We can use this to predict Relationships –Interactions –Regulatory relationships Protein function –Process –Complex / “molecular machine”

42 Confidence assessment Traditionally, biological networks determined individually –High confidence –Slow New methods look at entire organism –Lower confidence (  50% false positives) Inferences made based on this data

43 Confidence assessment Can use topology to assess confidence if true edges and false edges have different network properties Assess how well each edge fits topology of true network Can also predict unknown relations Goldberg and Roth, PNAS 2003

44 Use clustering coefficient, a local property Number of triangles = | N(v)  N(w) | Normalization factor? N(x) = the neighborhood of node x y x v w v w...

45 Mutual clustering coefficient Jaccard Index:Meet / Min:Geometric: |N(v)  N(w)| ---------------- |N(v)  N(w)| |N(v)  N(w)| 2 ------------------ |N(v)| · |N(w)| |N(v)  N(w)| ------------------------ min ( |N(v)|, |N(w)| ) Hypergeometric: a p-value

46 Mutual clustering coefficient Hypergeometric: P (intersection at least as large by chance) -log = neighbors of node v = neighbors of node w = nodes in graph

47 Prediction A v-w edge would have a high clustering coefficient v w

48 Confidence assessment Integrate experimental details with local topology –Degree –Clustering coefficient –Degree of neighbors –Etc. Bader, et al., Nature Biotechnology 2003

49 The synthetic lethal network has many triangles Xiaofeng Xin, Boone Lab

50 2-hop predictors for SSL SSL – SSL (S-S) Homology – SSL (H-S) Co-expressed – SSL (X-S) Physical interaction – SSL (P-S) 2 physical interactions (P-P) v w S:Synthetic sickness or lethality (SSL) H:Sequence homology X:Correlated expression P:Stable physical interaction Wong, et al., PNAS 2004

51 Multi-color motifs S:Synthetic sickness or lethality H:Sequence homology X:Correlated expression P:Stable physical interaction R:Transcriptional regulation Zhang, et al., Journal of Biology 2005

52 SSL “hubs” might be good cancer drug targets (Tong et al, Science, 2004) Normal cell Cancer cells w/ random mutations Alive Dead

53 Predict protein function from function of neighboring proteins “Guilt by association” Consider immediate neighbors –Schwikowski, et al., Nature Biotechnology 2001 Consider a given radius –Hishigaki, et al., Yeast 2001

54 Predict protein function from neighboring proteins (2) Minimize interactions between proteins with different annotations –Vazquez, et al., Nature Biotechnology 2003 –Karaoz, et al., PNAS 2004 Use network flow algorithm to “transport” function annotation –Nabieva, et al., Bioinformatics 2005

55 Lethality Hubs are more likely to be essential Jeong, et al., Nature 2001

56 Degree anti-correlation Few edges directly between hubs Edges between hubs and low-degree genes are favored Maslov and Sneppen, Science 2002

57 Beware of bias

58 Protein abundance Abundant proteins are –more likely to be represented in some types of experiments –More likely to be essential Correlation between degree (hubs) and essentiality disappears or is reduced when corrected for protein abundance Bloom and Adami, BMC Evolutionary Biology 2003

59 Degree correlation Anti-correlation of degrees of interacting proteins disappears in un-biased data Coulomb, et al., Proceedings of the Royal Society B 2005 010203040506070 degree k average degree K1 25 20 15 10 5 0 essential non-essential

60 Community structure Partitioning methods

61 Community structure Proteins in a community may be involved in a common process or function

62 Finding the communities Hierarchical clustering “Betweenness” centrality Dense subgraphs Similar subgraphs Spectral clustering Party and date hubs

63 Hierarchical clustering (1) Using natural edge weights Gene co-expression e.g., Eisen MB, et al., PNAS 1998 from www.medscape.com

64 Hierarchical clustering (2) Topological overlap A measure of neighborhood similarity l i,j is 1 if there is a direct link between i and j, 0 otherwise Ravasz, et al., Science 2002

65 Hierarchical clustering (3) Adjacency vector Function cluster: Tong et al., Science 2004 Find drug targets: Parsons et al., Nature Biotechnology 2004

66 “Betweenness” centrality Consider the shortest path(s) between all pairs of nodes “Betweenness” centrality of an edge is a measure of how many shortest paths traverse this edge Edges between communities have higher centrality Girvan, et al., PNAS 2002

67 Dense subgraphs Spirin and Mirny, PNAS 2003 –Find fully connected subgraphs (cliques), OR –Find subgraphs that maximize density: 2 m / (n (n-1)) Bader and Hogue, BMC Bioinformatics 2003 –Weight vertices by neighborhood density, connectedness –Find connected communities with high weights

68 Similar subgraphs Across species Interaction network and genome sequence e.g., Ogata, et al., Nucleic Acids Research 2000

69 Spectral clustering Compute adjacency matrix eigenvectors Each eigenvector defines a cluster: –Proteins with high magnitude contributions Bu, et al., Nucleic Acids Research 2003 positive eigenvaluenegative eigenvalue

70 Party and date hubs Protein interaction network Partition hubs by expression correlation of neighbors Han, et al., Nature 2004

71 Network connectivity Scale-free networks are: –Robust to random failures –Vulnerable to attacks on hubs Removing hubs quickly disconnects a network and reduces the size of the largest component Albert, et al., Nature 2000

72 Removing date hubs shatters network into communities Many sub-networks Date Hubs Party Hubs A single main component

73 Temporal partitioning Luscombe, et al., Nature 2004

74 Final words Network analysis has become an essential tool for analyzing complex systems –There is still much biologists can learn from scientists in other disciplines The references mentioned are representative, and not comprehensive


Download ppt "Gene and Protein Networks Monday, April 10 2006 CSCI 7000-005: Computational Genomics Debra Goldberg"

Similar presentations


Ads by Google