Presentation is loading. Please wait.

Presentation is loading. Please wait.

SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester.

Similar presentations


Presentation on theme: "SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester."— Presentation transcript:

1 SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester Wolfgang Müller, O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit) Jacky Snoep - University of Stellenbosch MS eScience Workshop, Pittsburgh, PA

2 SysMO=SYStems biology of Micro Organisms (2) (29) (22) (9) (4) (1) 11 projects, 91 partners, 9 countries, started 2007

3 Started July 2008, 3 years, 3 staff + 3 investigators, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes (intra- and inter-consortia). search for data, models and processes across the initiative. dissemination of results. SysMO-DB

4 SysMO-DB Team University of Stellenbosch, South Africa University of Manchester, UK Jacky Snoep EML Research gGmbH, Germany Isabel Rojas University of Manchester, UK Olga Krebs Wolfgang Müller Sergejs Aleksejevs Carole Goble Stuart Owen Katy Wolstencroft

5 Connect projects, connect to outside Project specific solutions Internally used tools & data Outside data and tools Project Public My Disk: Data Models Workflows Personal SysMO-DB, inter-project

6 Own solutions Suspicion Data issues Resource Issues Own data solutions and collaboration environments. wikis, e-Groupware, PHProject, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians. Many do not have data, or follow the standards that exist or know who is doing what. Much of the data cannot be compared Different organisms, different strains. No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping

7 Principles… Go for a series of small victories Realistic Don‘t reinvent Migrate to standards Sustainable and extensible Provide instant gratification Address doubt and anxiety Build it

8 Modellers Exchange Experimentalists Exchange Bioinformaticians Three types of people

9 „Natural“ collaboration within SysMO Short, simplified, black and white: Collaboration during project design Varying methods of collaboration during project Binomes (One modeller, one experimentalist) Groups collaborating with groups (occasional/formalized exchange of information) Varying success  Need for a watering hole/meeting point  Application where experimentalists/bioinf/ modelers meet ({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32) Trying to make experimentalists, modellers, bioinformaticians peacefully share resources

10 Some numbers & Some consequences 1 Software Engineer 1 Bioinformatician, 1 Bio-database specialist 11 projects, 91 partners 20 programmer days/year/project 2.5 programmer days/year/partner  “just in case“ approach impossible  Focus on real needs  “just in time“, “just enough“  The right 20%  Help people help themselves  Communication! 80-20-rule: 80% of the features won‘t be used anyway Useful features

11 Social Approach Questionnaires PALs (Project Area Liaison) 21 Postdocs and PhD students Bio/bioinf/modeller Our design and technical collaboration team Very intense face to face and virtual collaboration UK and Continental PALS Chapters Audits and Sharing Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

12 Communication via PALs DB teamPALSProjects Show what is there Suggest what is possible Ask for requirements Give requirements Tell priorities Rate outcomes Suggest improvements Double check Transmit Disseminate Collect answers

13 Need to find the guy who does xyz: Yellow pages Need to store Standard Operating Procedures Almost all our data is Excel Outcome of first PALs meeting:

14 What‘s there SysMO-SEEK screenshots

15 Yellow pages Tag clouds Bookmarks Yellow pages tabs ISA tabs

16 Standard Operation Procedures

17 JWS connection for modellers

18 View Study

19 New Assay (ISA)

20 Rights and sharing

21 Rights and sharing: create group

22 So much for the webapp Rights+Sharing Connection to modelers‘ tools Yellow pagesSOPs

23 Almost there: Improved excel support Matthew Horridge

24 Towards Just-Enough Exchange Incremental steps from beta to beta

25 Towards Just-Enough Exchange Largely a story about how to handle Excel sheets for user‘s benefits

26 SysMO Just Enough Exchange COSMIC Alfresco BaCell-SysMO Alfresco MOSES Wiki SysMO-LAB Wiki SABIO-RK Public Resources SABIO-RK Spread sheets Spread sheets Spread sheets Spread sheets BASE

27 Need for tradeoff Huge number of systems Huge number of standards (MIBBI, OBO…) Some of them big standards Too much to cope with a few people, but: Comparison needs standardisation Search needs standardisation  Need to move incrementally to just-enough standard implementation

28 Path = goal The journey is part of the reward Let people use what they use anyway If changes necessary, be as unintrusive as possible Be aware of legacy data Nudge people towards best practises Give instantly useful added value to as many users as possible: Simple search, simple exchange, simple tool use

29 A roadmap Provide convincing Web 2.0 functionality for use and as appetizer Yellow pages SOPs Upload service: Hand-triggered upload of link/file Hand-added metadata Harvesting+change detection service Automatic download Hand-added metadata Support for Excel templates Promote internal standards by use + tooling Mappers + parsers Classifiers Use other data types where appropriate SBML, Matlab, Mathematica…

30 Stability hierarchy Single group Single SysMO project Whole SysMO Template for a group of experiments More stable JERM data model Template best practise Project-level template Increasing stability Parsers/ annotators Enter into that Use mappers where needed

31 JERM Extraction Architecture MapperExtractor Template recognizer Data handler Harvester Data handler Classifier/Dispatcher Template recognizer Extractor Data Metad. Data Metad. Data Mapper Parser Data Metad. MapperExtractor Template recognizer Data handler Harvester Data handler Classifier/Dispatcher Template recognizer Extractor Data Mapper Parser Project repositories

32 Oops Some projects not prolonged Need all project data in the system fast, so…

33 JERM Extraction Architecture MapperExtractor Template recognizer Data handler Harvester Data handler Classifier/Dispatcher Template recognizer Extractor Data Metad. Data Metad. Data Mapper Parser Data Metad. MapperExtractor Template recognizer Data handler Harvester Data handler Classifier/Dispatcher Template recognizer Extractor Data Mapper Parser Data Project repositories

34 Lessons we‘re learning Some interesting bits along the way

35 Subsetting: Don‘t overwhelm Standards need to be comprehensive Goal: „Minimum information“… (MIBBI) Tends to be superset of what is needed for a project Example for non-applicable attributes Tissue of a single cell Gender  Useful to use adapted subset-templates Experimental design selection list

36 From biofolksonomy to ontology Observation: Fast growing set of standards Standards are moving target Incremental approach Keyword annotation Controlled selection lists Home-brewed taxonomies Use/contribution to standard ontologies Provide migration tools Tags + suggestions Home-brewed taxonomy

37 A word on software Template tooling Excel JAVA SysMO-SEEK (open source under Apache license) Ruby on Rails Convention over configuration Libraries & plugins Rails specific (e.g. acts_as_authenticated) SOLR & Lucene introduce JAVA/Ruby Database: MySQL also tested with SQLite (exclude db depedencies)

38 Summary SysMO-DB as a virtual meeting point for different flavours of systems biologists SysMO-DB‘s mantra: Just enough just in time Flexible JERM extracture architecture Just enough metadata (incremental) Lot done still a lot todo

39 Challenges ahead… Social PALs work great and motivated Now need moremoremore datadatadata Technical Publishing into public repositories Search + exploration: The test for data quality Hierarchical Faceted Search Distributed search via Taverna workflows More workflows via SysMO-SEEK Improve modelling support

40 Bonus track: what if… …the average data quality is below par?  „Nagging functionality“ Remind people of potentially faulty metadata Give suggestions what to improve and how Give possibility to create automatic mappings

41 Thanks EML People: Isabel Olga UMAN People: Carole Katy Finn Stuart Sergejs Jacky at Stellenbosch BBSRC BMBF KTF …and Microsoft for sponsoring this workshop

42 www.sysmo-db.org End + questons

43 END


Download ppt "SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester."

Similar presentations


Ads by Google