Presentation is loading. Please wait.

Presentation is loading. Please wait.

BioJava in 2002 An Open-Source Java Library for Bioinformatics (Matthew Pocock, BioJava Consulting LTD)

Similar presentations

Presentation on theme: "BioJava in 2002 An Open-Source Java Library for Bioinformatics (Matthew Pocock, BioJava Consulting LTD)"— Presentation transcript:

1 BioJava in 2002 An Open-Source Java Library for Bioinformatics (Matthew Pocock, BioJava Consulting LTD)

2 What is BioJava? Java code (Java2 required – 1.2 and higher) Open-Source Bioinformatics Library for building Applications Sequence Centric (we’d love to do more) Part of the Open Bioinformatics Foundation (OBF) Drop biojava.jar into your CLASSPATH & go

3 Where is BioJava? #biojava on

4 Who is BioJava? 35+ Developers in most continents and time- zones Core team >5 individuals Ever expanding user group

5 A look at some API Stuff

6 What’s Been There for a While? Sequences with hierarchical features Sequence databases Sequence IO – Various sequence formats (embl, genbank, gff, swissprot…) – Object model can be bypassed for high-performance scanning Probability distributions over symbols and Dynamic programming toolkit Blast Parsers

7 What’s Reasonably New? TagValue parser API Sequence Search APIs – Interoperable with BioJava XML-based parsers for many common sequence search algorithms Pure-Java SSAHA implementation Bit-packed sequence storage Taxonomies Literature References Phred

8 What’s Recently Improved? Gap handling – Consistent algebra for representing ambiguities (e.g. n), compound symbols (e.g. codons) and gaps DAS Client is now very robust – Distributed sequence API allows DAS-like distributed sequence databases to be easily built and implemented More ‘framey’ annotation bundles Sequence Rendering – Looks much better now – Handles ‘dotter-style’ 2d rendering We now actually write JUnit Tests!

9 Java 1.4-reliant Source Java 1.4 offers APIs that are really useful for Bioinformatics – Logging – NIO interfaces for fast IO and raw data access – Regular expressions – Cascading Exceptions Biojava code relying on 1.4 APIs are conditionally built – SSAHA implementation – Some parsers and handlers for TagValue – Restriction enzyme digests

10 OBDA and Fun Trips Sponsored by O’Reilly and Electric Genetics Developers attended a two-part Hackathon in Tuscon, AZ, USA and Cape Town, South Africa Representatives from BioJava, BioPerl, BioPython, BioRuby, Ensembl, Emboss and others We hammered out and implemented a range of standards designed from the ground up to be – Interoperable between the Bio* projects – Relatively easy to implement from scratch We drank lots of red wine

11 OBDA Support BIOCORBA – corba sequence interfaces BioSQL – relational tables and standard semantics for storing sequences BioFetch – cgi-bin-based sequence fetching XEMBL – xml-based sequence fetching Bio Directories – configuration file for resolving resources Flat-file Indexing – fetch records by ID and secondary ID from multiple ASCII files

12 Things We’d Like To Do in the Near Future Support non-DNA areas of Bioinformatics – Cladistics, evolutionary trees, clusters – Expression data – Proteomics – Networks/pathways – Biochemical reactions Integrate pre- and post-1.4 exception systems Modify the change notification system – Better synchronization and transaction support – Easier to optimize events that don’t have listeners – More robust handling of event cascades

13 What Will We See in BioJava 2? Pervasive use of Ontologies – Storing annotating data – Definition of processing pipelines (e.g. customizing parsers) – Bindings between BioJava interfaces and external data sources Das, biosql, biocorba – Pervasive querying making any BioJava application an Object Data Store with easy routes for data-providers to optimize searches Much more code generation – Push most repetitive code into code generators – Auto-generate much of the event notification web Much better transactionallity Reduce implementation cost for developers Expose any/all BioJava instances through SOAP Naming and Directory Services

14 And the Biggest Change of All? Make the library accessible to casual developers for writing throw-away scripts as well as system architects – Documentation – Tutorials – Training – Utility classes (e.g. SeqIOTools)

15 Some Contributors Brian GilmanBrian KingBrian Osborne Colin Hardman David H. Klatte David HuenDavid WaringGerald Loeffler Greg CoxHanning Ni Jason StajichKalle Näslund Keith JamesKim Rutherford Lei Lai Mark Schreiber Martin Senger Mathieu Wiepert Matthew Pocock Michael Jones Mike JonesNimesh Singh Ron KuhnSamiul Hasan Simon Brocklehurst Stuart Johnston Thad WelchThomas Down Tim DilksO|B|F

Download ppt "BioJava in 2002 An Open-Source Java Library for Bioinformatics (Matthew Pocock, BioJava Consulting LTD)"

Similar presentations

Ads by Google