Presentation on theme: "Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006 BeeHive a datamining tool at Biovitrum and iNovacia."— Presentation transcript:
Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June BeeHive a datamining tool at Biovitrum and iNovacia
Research Informatics Philosophy All data in Oracle –Safe, pharma industry standard (e.g. many chemical cartridges, ChemAxon, MDL, Accelrys,...) –Data is our asset. Programs come and go. Integration through database layer –...but hidden to the users. Multiple front-ends allowed Applications rapidly adapted to users needs –Close connection developers - users –Workflow support requires full control over the code Unorthodox solutions are allowed –Sometimes quick and dirty development –Sometimes unstable code (but usually fixed quickly...) –Sometimes non-standard technical platform (e.g. Bee language)
BeeHive Function –Main repository for ALL research data (almost) –Used by all project teams –Technical platform for various modules Features –Advanced on-the-fly join of DB table –Versatile handling of lists (compounds, batches, projects...) and Queries –Data grouping (One-line-per-compound) –Fully customisable through meta-data, easy to add new branches (CBT, ELN stats etc) –Structure searching through ChemAxon Oracle cartridge –Built on Bee language from MolSoft LLC, San Diego Status –Moved from MDLs cartridge 2006 –Business critical. Appr 250 users throughout R&D
The heart – just a SQL generator… Defines column types and cost for all joinable columns All possible joins are pre- calculated, travelling salesman problem (more then 300 tables)
Meta data structure Define entities and clean up the dictionaries –Compound numbers, protein targets, batches, plasmids... –One source for every entity possible to validate numbers no misspellings improved data quality This is the core of integration - not a particular client or system None of this comes out-of-the-box! Cross database client Prog 1 Prog 2 Example from Biovitrum
The BEE language High level object oriented scripting language The core interpreter can be extended with powerful modules for –Graphical user interface components –Database connectivity for Oracle, mySQL etc –Molecular objects including chemical drawing –XML-friendly Very compact and efficient code compared with e.g. Java Platform independent (Linux, MacOS X and Windows) Written by Ruben Abagyan & Eugene Raush Code Result
Activity, solubility, chemist etcQuery builder with structural searchingNavigate through all tables BeeHive Overview
Query builder All unique values in drop-down lists No hard-coded values Easy to spot errors
Extraction of data for SAR analysis One compound per line Average IC 50 and SD values Hill number from ActivityBase Structure pop-up window
Systems and applications: BeeHive Modules That Uses JChem CIMS –Chemical Inventory Management System –Keeps track of all chemicals (bottle history, location, risk phrases etc) –Replaced previous MDL system –Fully barcoded (bottles, shelves, people...) –Has improved compliance, reagent availability and speed of inventory work Reagent Search –ACX database of chemical catalogues from CambridgeSoft –Cross-linked to CIMS –Give me all amines under 250 Dal and show in-house on top of the list
Systems and applications: BeeHive Modules /contd/ ChemSpec –Registration of all new compounds –Structure based logic for new compounds and batches –BVT (iNo) number assignment –Connection point for analytical data and requests –Used by all medicinal and analytical chemists
What is next on the list? JChem Calculated properties on all molecule databases –pKa, logP, logD,... Generation of diverse screening sets on the fly (BCUT?)...
Summary - informatics Data sharing is crucial Excel is not enough! No database no modelling Each organisation must define their meta data You need a database administrator Define the data structure first - applications can be improved gradually
RI People and their roles Mats Kihlén –MSc Eng Physics. Head of RI, Ex-computational chemist. Mats Dahlberg –Developer. MSc Computer Science. All-purpose - any language. Mikael Malmgren –DBA. Database architecture & maintenance. Chemical Registration John Marelius –Developer. PhD Computational Chemistry. Lab automation.