Presentation is loading. Please wait.

Presentation is loading. Please wait.

The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian.

Similar presentations


Presentation on theme: "The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian."— Presentation transcript:

1 The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian von Mering, University of Zurich & SIB bigDATA Workshop

2 - viewers for all types of evidence - focus on useability and speed - integrated scoring scheme - information transfer between species Genomic Neighborhood Genes/Species Co-occurence Gene Fusions Database Imports Exp. Interaction Data Co-expression Literature co-occurence STRING http://string-db.org/

3 http://string-db.org 630 organisms 2.6 Mio proteins 88 Mio interactions server-footprint: 320 Gb Numbers:

4 networks Phylogenetic Profiles Conserved Neighborhood Gene-Fusions quantify … integrate … Interaction prediction from genome information “genomic context”

5 Other Interaction Sources Interaction DatabasesPathway Databases Reactome Automated TextminingInterolog Transfer

6 final interaction score: protein A – protein B 0.856 between 0 and 1, pseudoprobability, “likelihood of functional association” 1 – (1 – nscore) * (1 – fscore) * (1 – pscore) * (1 – cscore) * (1 – escore) * (1 – tscore) neighborhoodfusioncooccurencecoexpression experimental textmining nscore = 1 – (1 – nscore query species ) * (1 – nscore transf. ) evidence transfer between species information transfer between species either via orthologs (COG database) or via homology analog for cscore, escore, tscore,... benchmarking raw score KEGG performance (fraction on same map) raw score Example - Neighborhood raw score: each predictor has its own raw-score regime gene Agene B 100 bp6 bp20 bp raw score: sum of intergenic distances The scoring system

7 The raw score regimes gene Agene B 100 bp6 bp20 bp raw score: sum of intergenic distances Neighborhood Phylogenetic profiles “similarity profiles” singular value decomposition raw score: euklidian distance filter: downweigh scores for homologous pairs raw score: constant (0.99) Fusionexperimental interactions two-hydrid, TAP, annotated complexes, … topology-based analysis: who with whom, how many other partners? raw score: various (usually ‘uniqueness’ of interaction). Co-expression download all microarray datasets for a given species data normalization (spatial correction) raw score: pairwise pearson-correlation coefficient Textmining download all PubMed abstracts identify proteins in the abstracts search for co-mentioned pairs raw score: log-odds score

8 User-Experience: Aiming to be Visual and Intuitive

9 1’000 visits / day 800 users / day 9’000 pageviews / day > 10’000 DB-queries / day

10 Citations 2000 NAR Snel et al. 2003 NAR von Mering et al. 2005 NAR von Mering et al. 2007 NAR von Mering et al. 2009 NAR Jensen et al. 80 citations 215 citations 183 citations 189 citations 47 citations total: 714 citations

11 Cross-links SMART: protein domain information GENECARDS: info and products on human genes SWISS-MODEL-REPOSITORY: homology models CYTOSCAPE: access via plug-in architecture SWISSPROT / UNIPROT: expert protein annotation

12 Cross-link example launch SwissModel

13 Reciprocal View popup: launch STRING

14 Example #1 A missing chaperone for Cytochrome C oxidase Question: who inserts the Copper-atom into CcO ?

15 Initial observation: Example #1 The missing chaperone for Cytochrome C oxidase

16 Example #1 The missing chaperone for Cytochrome C oxidase gene expressed structure solved it binds copper ! likely function - copper delivery

17 Example #2 Simplify discovery in genome-wide association screens ? Christian von Mering – UZH MolBio – SIB

18 a)download data in relational database scheme d)cross-link to server (version controlled, to network, protein, link,...) In-House Use of STRING b)download data as compact flat-files e)PSI-MI export f)[ SOAP / webservices ] c)in-house installation of webserver

19 Core organisms: include all model organisms (annotated knowledge) non-redundant, each genus is covered include organisms with functional genomics data Irrelevant Organisms [future category] Version 9.0 – exceeding 1000 genomes

20 More details & new features

21 “Payload Display” - Your Own STRING Server => “branding” STRING  via remote-control:  a call-back API => “branding” STRING  via remote-control:  a call-back API

22 Acknowledgements The STRING team: Samuel Chaffron Manuel Weiss Michael Kuhn Lars Juhl Jensen Sean Hooper Berend Snel Martijn Huynen Peer Bork The STRING institutions: SIB – Swiss Institute of Bioinformatics University of Zurich TU-Dresden, University of Copenhagen European Molecular Biology Laboratory

23

24 “MySTRING”  users can register / login  using OpenID or similar for authentication  persistency of search results (“history”)  store lists / items of interest (“bag of genes”)  users can customize the interface  generate revenue (?)

25 Feature #2 (Finding Relevant Texts)

26 Example #2 The missing enzymes for uric acid degradation Question: why can’t humans degrade uric acid ?

27 Example #2 The missing enzymes for uric acid degradation ? ?

28 Example #2 The missing enzymes for uric acid degradation initial observation:

29 Example #2 The missing enzymes for uric acid degradation genes cloned, expressed enzymatic activity demonstrated candidate short-term therapeutics !


Download ppt "The STRING Database What it does and how it interfaces to other resources The STRING Database What it does and how it interfaces to other resources Christian."

Similar presentations


Ads by Google