Download presentation
Presentation is loading. Please wait.
1
The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian von Mering, University of Zurich & SIB bigDATA Workshop
2
- viewers for all types of evidence - focus on useability and speed - integrated scoring scheme - information transfer between species Genomic Neighborhood Genes/Species Co-occurence Gene Fusions Database Imports Exp. Interaction Data Co-expression Literature co-occurence STRING http://string-db.org/
3
http://string-db.org 630 organisms 2.6 Mio proteins 88 Mio interactions server-footprint: 320 Gb Numbers:
4
networks Phylogenetic Profiles Conserved Neighborhood Gene-Fusions quantify … integrate … Interaction prediction from genome information “genomic context”
5
Other Interaction Sources Interaction DatabasesPathway Databases Reactome Automated TextminingInterolog Transfer
6
final interaction score: protein A – protein B 0.856 between 0 and 1, pseudoprobability, “likelihood of functional association” 1 – (1 – nscore) * (1 – fscore) * (1 – pscore) * (1 – cscore) * (1 – escore) * (1 – tscore) neighborhoodfusioncooccurencecoexpression experimental textmining nscore = 1 – (1 – nscore query species ) * (1 – nscore transf. ) evidence transfer between species information transfer between species either via orthologs (COG database) or via homology analog for cscore, escore, tscore,... benchmarking raw score KEGG performance (fraction on same map) raw score Example - Neighborhood raw score: each predictor has its own raw-score regime gene Agene B 100 bp6 bp20 bp raw score: sum of intergenic distances The scoring system
7
The raw score regimes gene Agene B 100 bp6 bp20 bp raw score: sum of intergenic distances Neighborhood Phylogenetic profiles “similarity profiles” singular value decomposition raw score: euklidian distance filter: downweigh scores for homologous pairs raw score: constant (0.99) Fusionexperimental interactions two-hydrid, TAP, annotated complexes, … topology-based analysis: who with whom, how many other partners? raw score: various (usually ‘uniqueness’ of interaction). Co-expression download all microarray datasets for a given species data normalization (spatial correction) raw score: pairwise pearson-correlation coefficient Textmining download all PubMed abstracts identify proteins in the abstracts search for co-mentioned pairs raw score: log-odds score
8
User-Experience: Aiming to be Visual and Intuitive
9
1’000 visits / day 800 users / day 9’000 pageviews / day > 10’000 DB-queries / day
10
Citations 2000 NAR Snel et al. 2003 NAR von Mering et al. 2005 NAR von Mering et al. 2007 NAR von Mering et al. 2009 NAR Jensen et al. 80 citations 215 citations 183 citations 189 citations 47 citations total: 714 citations
11
Cross-links SMART: protein domain information GENECARDS: info and products on human genes SWISS-MODEL-REPOSITORY: homology models CYTOSCAPE: access via plug-in architecture SWISSPROT / UNIPROT: expert protein annotation
12
Cross-link example launch SwissModel
13
Reciprocal View popup: launch STRING
14
Example #1 A missing chaperone for Cytochrome C oxidase Question: who inserts the Copper-atom into CcO ?
15
Initial observation: Example #1 The missing chaperone for Cytochrome C oxidase
16
Example #1 The missing chaperone for Cytochrome C oxidase gene expressed structure solved it binds copper ! likely function - copper delivery
17
Example #2 Simplify discovery in genome-wide association screens ? Christian von Mering – UZH MolBio – SIB
18
a)download data in relational database scheme d)cross-link to server (version controlled, to network, protein, link,...) In-House Use of STRING b)download data as compact flat-files e)PSI-MI export f)[ SOAP / webservices ] c)in-house installation of webserver
19
Core organisms: include all model organisms (annotated knowledge) non-redundant, each genus is covered include organisms with functional genomics data Irrelevant Organisms [future category] Version 9.0 – exceeding 1000 genomes
20
More details & new features
21
“Payload Display” - Your Own STRING Server => “branding” STRING via remote-control: a call-back API => “branding” STRING via remote-control: a call-back API
22
Acknowledgements The STRING team: Samuel Chaffron Manuel Weiss Michael Kuhn Lars Juhl Jensen Sean Hooper Berend Snel Martijn Huynen Peer Bork The STRING institutions: SIB – Swiss Institute of Bioinformatics University of Zurich TU-Dresden, University of Copenhagen European Molecular Biology Laboratory
24
“MySTRING” users can register / login using OpenID or similar for authentication persistency of search results (“history”) store lists / items of interest (“bag of genes”) users can customize the interface generate revenue (?)
25
Feature #2 (Finding Relevant Texts)
26
Example #2 The missing enzymes for uric acid degradation Question: why can’t humans degrade uric acid ?
27
Example #2 The missing enzymes for uric acid degradation ? ?
28
Example #2 The missing enzymes for uric acid degradation initial observation:
29
Example #2 The missing enzymes for uric acid degradation genes cloned, expressed enzymatic activity demonstrated candidate short-term therapeutics !
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.