Presentation is loading. Please wait.

Presentation is loading. Please wait.

BridgeDb Martijn van Iersel BiGCaT Maastricht. The 7 Virtues of Bioinformatics 1.Solve a problem 2.Start small 3.Modularity 4.Design for code re-use 5.Open.

Similar presentations


Presentation on theme: "BridgeDb Martijn van Iersel BiGCaT Maastricht. The 7 Virtues of Bioinformatics 1.Solve a problem 2.Start small 3.Modularity 4.Design for code re-use 5.Open."— Presentation transcript:

1 BridgeDb Martijn van Iersel BiGCaT Maastricht

2 The 7 Virtues of Bioinformatics 1.Solve a problem 2.Start small 3.Modularity 4.Design for code re-use 5.Open Source 6.Attention to detail 7.Eat your own dog-food

3 Solve a problem What problem are you solving?

4 Problem: Identifier Mapping ? Agilent reporter A46_P45789 Entrez Gene 3643

5 Solution: Conversion tools

6 Problem: Usability Check for double IDs Check for missing IDs Only 1000 at once Check alignment of Excel columns Manual Error-prone

7 Solution: Built-in Mapping Generic bioinformatics platforms should have identifier mapping built-in. BioConductor PathVisio Cytoscape... Batteries Included

8 Solution: Built-in Mapping Mapping service Entrez Gene 3643 Agilent reporter A46_P45789

9 Synergizer EnsMart DAVID CRONOS AliasServer MatchMiner OntoTranslate Problem: Which mapping service?

10 Solution: Abstraction Layer

11 interface IDMapper class IDMapperRdb relational database class IDMapperFile tab-delimited text class IDMapperBiomart web service

12 CyThe- saurus Wiki Pathways PathVisio Network Merge BridgeDb Internet webservices BioMart BridgeDb- REST Local Database Tab- delimited text files Tools Mapping Services PICR Cytoscape Plugins BMC Bioinformatics Jan 4;11(1):5

13 BridgeDb interface 1: JAVA interface2: REST interface

14 API Overview BridgeDb.connect(...) IDMapper.mapID(...) Xref.getUrl() DataSource.getUrl()

15 Easy & Flexible Code

16

17

18 BridgeDb interface 1: JAVA interface2: REST interface

19 REST API ILMN_ Illumina Affy NP_ RefSeq IPI IPI GO: GeneOntology NM_033282RefSeq Affy 94233Entrez Gene ENSG Ensembl Human _atAffy A6NEB4Uniprot/TrEMBL Illumina GO: GeneOntology OMIM A_23_P24234Agilent 14449HUGO

20 REST API / / [ /... ]\

21 R Example

22 Types of Mapping Services TypeAdvantages Webservice+ always up-to-date + no disk-space required + no installation required Relational Database + highly efficient + versioned: updated only when you want to. Flat file+ easy to customize

23 Available Mapping Services NameTypeMaintainer Gene Databases (Ensembl based) DatabaseUs Metabolite databases (HMDB-based) DatabaseUs BridgeWebserviceWebserviceUs BioMartWebserviceEBI CRONOSWebserviceHemholtz Zentrum SynergizerWebserviceHarvard Medical School PICRWebserviceEBI

24 Problem: Custom Microarrays Custom probe #QXZCY!34 ?

25 EnsMart Custom table Solution: Stacking

26 Ensembl EntrezCustom microarray Relation defined by mapping source A Relation defined by mapping source B Inferred, transitive relationship

27 Comparison

28

29 CyThesaurus

30 MIRIAM Resources

31 Solution: MIRIAM Resources Regular expression for autodetection Pattern for generating URLs Link to documentation

32 The 7 Virtues of Bioinformatics 1.Solve a problem 2.Start small 3.Eat your own dog-food 4.Attention to detail 5.Modularity 6.Design for code re-use 7.Open Source

33 A Question to Linus Torvalds Q: “Do you have any tips for people who want to undertake a large open source project?” A: “Nobody should start to undertake a large project. You start with a small trivial project, and you should never expect it to get large.… … If it doesn't solve some fairly immediate need, it's almost certainly over-designed.… …You need to get something half-way useful first, and then others will say "hey, that almost works for me", and they'll get involved in the project”

34 Also from Linus Torvalds “I'm right and anyone who disagrees is stupid and ugly” “My name is Linus Torvalds and I am your god.”

35 Code Re-Use Reinventing the wheel is one of the 7 Deadly sins of Bioinformatics

36 Code Re-Use

37 Q: How to design re-usable code? A: Actually use it in more than one project from the start bridgedb Cytoscape PathVisio

38 Modularity

39

40

41 Open source Public money -> Public code Reproducibility Academic ideal Trust Insurance against vendor lock-in

42 Open source Now where are all those free programmers?

43 Open Source Web site Version controlMailing list Bug tracker

44

45 Eat your own dog food

46 Are you named “alkfdjlkdsf”? Why not “Hélène O’Brian?” …or “Bobby Tables”?

47 Eat your own dog food Real data has missing values Real data has commas instead of dots Real data has duplicate identifiers Real data starts with “ID” in the first cell* *Which Excel doesn’t like

48 User friendliness

49

50 Hallway usability testing Grab a passer-by from the hallway and put them in front of your program (We usually use students)

51 Thanks Alex Pico (UCSF) Kristina Hanspers (UCSF) Isaac Ho (UCSF) Bruce Conklin (UCSF) Jianjiong Gao (U. Missouri) Thomas Kelder (BiGCaT, Maastricht) Chris Evelo (BiGCaT, Maastricht) Brian Turner (U. Toronto) Igor Rodchenkov (U. Toronto)

52 Ways to run BridgeDb (1/3)

53 Ways to run BridgeDb (2/3)

54 Ways to run BridgeDb (3/3)

55 Open source Is it difficult?

56 Open source = = rw

57 Open source = = rw * = r


Download ppt "BridgeDb Martijn van Iersel BiGCaT Maastricht. The 7 Virtues of Bioinformatics 1.Solve a problem 2.Start small 3.Modularity 4.Design for code re-use 5.Open."

Similar presentations


Ads by Google