Presentation is loading. Please wait.

Presentation is loading. Please wait.

Human Genetics Integrative Bioinformatics using Cytoscape (and R2)

Similar presentations


Presentation on theme: "Human Genetics Integrative Bioinformatics using Cytoscape (and R2)"— Presentation transcript:

1 Human Genetics Integrative Bioinformatics using Cytoscape (and R2)

2 Human Genetics (Bio)Chemistry versus Molecular Biology …some basic concepts (Bio)Chemistry Concentrations Molecular structures Reaction equations Quantitative Defined experimental setup Molecular Biology Regulation Large biomolecules Large scale processes Qualitative Complex experimental setup (by necessity!)

3 Human Genetics Molecular Biology: New techniques Integrative Bioinformatics needed  (Deep)Sequencing – Arrays – Proteomics Quantitative analysis –handling large datasets –statistics Capturing complexity –integration –graphs Integrative Bioinformatics: Integrated Bioinformaticians!

4 Human Genetics Integrative Bioinformatics: An example

5 Human Genetics Integrative Bioinformatics: What they did 1.Sequence genome; assign gene function using protein sequence, structural similarities (Bonneau et al., 2004; Ng et al., 2000) 2.Perturb cells: environmental factors; knockouts (Baliga et al., 2004; Kaur et al., 2006; Kottemann et al., 2005) 3.Measure changes: microarrays (Baliga et al., 2004;Kaur et al., 2006; Whitehead et al., 2006). 4.Integrate diverse data (mRNA levels, evolutionarily conserved associations among proteins, metabolic pathways, cis-regulatory motifs, etc.) with the cMonkey algorithm to reduce data complexity and identify subsets of genes that are coregulated in certain environments (biclusters) (Reiss et al., 2006). 5.Using the machine learning algorithm Inferelator construct a dynamic network model for influence of changes in EFs and TFs on the expression of coregulated genes (Bonneau et al., 2006). 6.Explore the network with Gaggle, a framework for data integration and software interoperability to formulate and then experimentally test hypotheses to drive additional iterations of steps 2–6 (Shannon et al., 2006)

6 Human Genetics Integrative Bioinformatics: Their framework

7 Human Genetics Integrative Bioinformatics: results

8 Human Genetics Goes to show that: 1.Aggregate 2.Search/Visualize 3.Analyze/Feedback Combine data from different sources Filter Algorithms Need for adaptable software Goal: Facilitate ideas

9 Human Genetics Cytoscape - Network Visualization and Analysis Freely-available (open- source, java) software, easily extensible (Plugin API) Visualizing networks (e.g. molecular interaction networks) Analyzing networks with gene expression profiles and other cell state data (GO, proteomics, …) Used in several hundred analyses in recent literature Continuity guaranteed

10 Human Genetics An example Cytoscape work-flow

11 Human Genetics Cytoscape Workflow 1. Load Networks (Import network data into Cytoscape)‏ 2. Load Attributes (Get data about networks into Cytoscape)‏ 3. Analyze and Visualize Networks 4. Prepare for Publication A specific example of this workflow: –Cline, et al. “Integration of biological networks and gene expression data using Cytoscape”, Nature Protocols, 2, 2366- 2382 (2007).

12 Human Genetics Networks as graphs A Network is a collection of –Nodes (or vertices) –Edges connecting nodes (directed or undirected, weighted, multiple edges, self-edges) Nodes can represent proteins, genes, metabolites, or groups of these (e.g. complexes) - any sort of object Edges can be either physical or functional interactions, activators, regulators, reactions - any sort of relations

13 Human Genetics Cytoscape Workflow 1. Load Networks (Get network data into Cytoscape)‏ 2. Load Attributes (Get data about networks into Cytoscape)‏ 3. Analyze and Visualize Networks 4. Prepare for Publication

14 Human Genetics Creating a network

15 Human Genetics Free-format Text and Excel Files Specify Input File Define Columns Text Parsing Options Preview

16 Human Genetics http://pathguide.org : over 240 pathway db’s Pathways: plenty resources

17 Human Genetics All kinds of network data… Physical interactions –Protein – Protein interactions –Protein – DNA interactions –Metabolic interactions Functional interactions –Co-expression relations –Genetic interactions –Knockout/siRNA – targets

18 Human Genetics Pre-formatted Network Files Cytoscape supports many popular file formats:  SIF (Simple Interaction Format)‏  GML (Graph Markup Language)‏  XGMML (eXtensible Graph Markup and Modeling Language)‏  BioPax (Biological Pathway Data)‏  PSI-MI 1 & 2.5 (Protein Standards Initiative)‏  SBML Level 2 (Systems Biology Markup Language)‏ Available for download from data sources (URLs, web-services, formatted table files)

19 Human Genetics Internet Databases Cytoscape version 2.6 –web service clients: import networks directly from several trusted internet resources  IntAct (MBL-EBI)  PathwayCommons (collection of data resources)  NCBI Entrez Gene  Many more will be included...

20 Human Genetics Interaction Database Search Import Visualize and Analyze

21 Human Genetics Cytoscape Workflow 1. Load Networks (Get network data into Cytoscape)‏ 2. Load Attributes (Get data about networks into Cytoscape)‏ 3. Analyze and Visualize Networks 4. Prepare for Publication

22 Human Genetics What are Attributes? Any data that describes or provides details about the nodes and edges in the network –Gene Expression Data –Mass Spectrometry Data –Protein Structure Information –Gene Ontology (GO) terms –Interaction Confidence Values, etc Cytoscape support multiple data types –Numbers (integers, floats) –Text (strings) –Logical (booleans) –Lists…

23 Attribute Management Node or Edge ID Specific Attribute Tabs Select Attributes for Display Strings and floating type of attributes

24 Human Genetics Load Attributes: Import Attribute Files Map data about Networks onto Networks. Attributes can be loaded in many of the same ways as networks.  Import pre-formatted attribute files  Import formatted text or Excel files  Create attributes manually in attribute editor  Load attributes from web services  ID mapping though node attributes

25 Human Genetics ID Mapping Mapping identifiers from one source to another is a major challenge Multiple levels of IDs E.g. probe->gene ->peptide- >protein Cytoscape provides an ID mapping through the BioMart web service of EBI to convert the IDs Not perfect but sufficient Additional mapping mechanism underway

26 Human Genetics Cytoscape Workflow 1. Load Networks (Get network data into Cytoscape)‏ 2. Load Attributes (Get data about networks into Cytoscape)‏ 3. Analyze and Visualize Networks 4. Prepare for Publication

27 Human Genetics Visual Data Integration 1. Network Data 2. Attribute Data YDR382W pp YDL130W YDR382W pp YFL039C YFL039C pp YCL040W YFL039C pp YHR179W ExpressionValue YCL040W = 0.542 YDL130W = -0.123 YDR382W = -0.058 YFL039C = 0.192 YHR179W = 0.078 VizMapper

28 Human Genetics VizMapper List of Data Attributes Default Visual Style Editor List of Visual Attributes Mapping definition List of Visual Styles

29 Human Genetics Types of mappings Continuous  Continuous Data mapped to Continuous Visual Attributes (e.g. gene expression levels mapped to node color)  Continuous Data mapped to Discrete Visual Attributes (e.g. p-value categories mapped to node shape) Discrete  Discrete (categorical) Data to Discrete Visual Attributes (e.g. GO annotation mapped to node shape)  Discrete Data mapped to Continuous Visual Attributes(e.g. multiple GO terms mapped to pie coloring)

30 Human Genetics Network Filtering

31 Human Genetics Several Layout Algorithms Spring-embedded Circular Hierarchical

32 Human Genetics Linkout Nodes and Edges act as hyperlinks to external databases. User-configurable URLs Collection of the biological results for the publication

33 Human Genetics Cytoscape Workflow 1. Load Networks (Get network data into Cytoscape)‏ 2. Load Attributes (Get data about networks into Cytoscape)‏ 3. Analyze and Visualize Networks 4. Prepare for Publication

34 Human Genetics Prepare for Publication Fine tune the Figures Manual Layout manipulation options (align, scale, rotate) Manually override visual styles –place labels, change colors, etc.

35 Human Genetics Finalizing the Figures Publication Quality Graphics in several formats  PDF, EPS, SVG, PNG, JPEG, and BMP Export Session to HTML for Web

36 Human Genetics Cytoscape: So what? The big Pro Cyto argument: EXTENSIBLE Plugins, Plugins, Plugins –In our case enabled extended array data analysis

37 Human Genetics Cytoscape is Extensible Cytoscape is open source and free software A plugin interface that allows any programmer to write their own extensions to Cytoscape Plugins represent the primary biological analysis mechanism in Cytoscape Plugins are distributed from a central Cytoscape database and can be installed while running Several dozens of plug-ins currently available (www.cytoscape.org/plugins/index.php)

38 Human Genetics Hello World Plugin http://cytoscape.org/cgi-bin/moin.cgi/Hello_World_Plugin http://cytoscape.org/cgi-bin/moin.cgi/Developer_Homepage

39 Human Genetics Extending the workflow through plugins Graph based integration and analysis of molecular biological data

40 Human Genetics Integrative Bioinformatics in our group Aggregate data: 18000+ Affymetrix arrays –Tumor series –Public data –Experiments Manipulate celllines; Lentiviral library Search/Visualize/Selection: R2 –Statistical cutoffs –Correlations: R2 –Clinical data coupling Analysis/Feedback: R2 and Cytoscape –Known Interactions –Transcription Factor binding

41 Human Genetics External data sources Statistical analysis Perl module Cytoscape webstart AMC Plugin Canonical paths DB Patient data GEO arrays Algorithms Array data: Tumor and Experiments R2-array analysis interface Cytoscape interface HGServer Integrative Bioinformatics in our group

42 Human Genetics Array data analysis: R2 Mainly work by Jan Koster

43 Human Genetics R2 interface: Demo

44 Human Genetics R2 interface

45 Human Genetics R2 interface

46 Human Genetics R2 interface

47 Human Genetics R2 interface

48 Human Genetics R2 interface

49 Human Genetics Timeseries in R2 / Cytoscape (Demo)

50 Human Genetics Timeseries in R2

51 Human Genetics Timeseries in R2

52 Human Genetics Timeseries in R2 Integration with Cytoscape through webstart

53 Human Genetics Timeseries in Cytoscape: Visualization

54 Human Genetics Timeseries in Cytoscape: Aggregate data

55 Human Genetics Timeseries in Cytoscape: Search/Filter

56 Human Genetics Timeseries in Cytoscape: Filter

57 Human Genetics Timeseries in Cytoscape

58 Human Genetics Timeseries in Cytoscape

59 Human Genetics Tf (green) and partners (red)

60 Human Genetics Filtering

61 Human Genetics Filtering

62 Human Genetics Coloring, layout

63 Human Genetics Resuming: 1.Aggregate 2.Search/Visualize 3.Analyze/ Feedback Combine NOTCH3 knockout data with TF and PPi data Layout timeseries/Find downstream targets Identify MSX1/Knockout in new experiment

64 Human Genetics More Plugin Examples BiNGO (Enriched GO categories found in the sub-network) WikiPathways (Visualize curated pathways) MCODE (Putative protein complexes) GenePro (Protein-Protein interaction cluster visualization) jActiveModules (Search for significant sub-networks) NetworkAnalyzer (Statistical analysis of networks) Agilent Literature Search (Network creation) CyGoose (Gaggle communication) See http://cytoscape.org/plugins for many more

65 Human Genetics Timeseries and BinGO: Aggregate

66 Human Genetics Timeseries and BinGO: Analyze

67 Human Genetics Timeseries and BinGO

68 Human Genetics Timeseries and BinGO

69 Human Genetics GOlorize plug-in (Pasteur) Node placement on the basis of both the connection structure (the edges) and the class structure (GO) A modification of the classic force-directed layout algorithm Beyond GO classes, other class information can be used though attributes (e.g. active modules, complexes)

70 Human Genetics GOlorize plug-in interface Default settings for the class attractive force and separation factor Class-directed network layout

71 Human Genetics Example: genetic interaction network Standard Spring-embedded layout algorithm in Cytoscape

72 Human Genetics Example: genetic interaction network Spring-embedded layout algorithm with GO colour-coding

73 Human Genetics Example: genetic interaction network Final results of the GOlorize layout algorithm in Cytoscape Garcia et al. Bioinformatics 2007

74 Human Genetics Find Network Clusters - MCODE Plugin Network clusters are highly interconnected sub-networks that may be also partly overlapping Clusters in a protein-protein interaction network have been shown to represent protein complexes and parts of biological pathways Clusters in a protein similarity network represent protein families Network clustering is available through the MCODE Cytoscape plugin

75 Human Genetics Network Clustering 7000 Yeast interactions among 3000 proteins

76 Human Genetics Bader & Hogue, BMC Bioinformatics 2003 4(1):2

77 Human Genetics Proteasome 26S Proteasome 20S Ribosome RNA Pol core RNA Splicing Bader & Hogue, BMC Bioinformatics 2003 4(1):2

78 Human Genetics Find Network Motifs - Netmatch plugin Network motif is a sub-network that occurs significantly more often than by chance alone Input: query and target networks, optional node/edge labels Output: topological query matches as subgraphs of target network Supports: subgraph matching, node/edge labels, label wildcards, approximate paths http://alpha.dmi.unict.it/~ctnyu/netmatch.html

79 Human Genetics Finding query sub-networks QueryResults Ferro et al. Bioinformatics 2007

80 Human Genetics Finding Signaling Pathways Potential signaling pathways from plasma membrane to nucleus via cytoplasm Raf-1 Mek MAPK TFs Nucleus - Growth Control Mitogenesis MAP Kinase Cascade Ras NetMatch query Shortest path between subgraph matches Signaling pathway example NetMatch Results

81 Human Genetics Find Active Subnetworks Active modules are sub-networks that show differential expression over user-specified conditions or time-points  Microarray gene-expression attributes  Mass-spectrometry protein abundance Method  Calculate z-score/node, Z A score/subgraph, correct for random expression data sampling  Score over multiple experimental conditions  Simulated annealing-based search method is used to find the high scoring networks Ideker T, Ozier O, Schwikowski B, Siegel AF Bioinformatics. 2002;18 Suppl 1:S233-40

82 Human Genetics Finding active modules Ideker T et al. Science 2001; Bioinformatics 2002 jActiveModules plug-in Input: interaction network and p-values for gene expression values over several conditions Output: significant sub- networks that show differential expression over one or several conditions

83 Human Genetics Cerebral: Cellular location and expression data

84 Human Genetics Concluding Cytoscape is a proven valuable tool for integrative bioinformatics Easily extensible: well suited to answer new biological research questions Analyses can be tedious for biologists; up to bioinformaticians to translate these in simple workflows Therefore: bioinformaticians, integrate into wet-lab research groups!

85 Human Genetics Some notes… Plugin lifetime –Maintenance –Interoperability Visualization issues… –Standard biologist layouts –Fancy visuals Cytoscape 3.0 aims to solve these issues (amongst others)

86 Human Genetics Availability Cytoscape: –http://cytoscape.org –cytoscape-discuss@googlegroups.com –cytoscape-helpdesk@googlegroups.com R2 –Available shortly through http://humangenetics-amc.nl –Keep yourself posted on http://groups.google.com/group/r2-announce


Download ppt "Human Genetics Integrative Bioinformatics using Cytoscape (and R2)"

Similar presentations


Ads by Google