your name Dr Karol Kozak ETH Zurich May 2009, Paris OME Large Scale Biological Data Handling and Analysis Using Open Source Alternatives
your name Bioinformatics STORAGE Data acquisition Instrument management Image processing Normalization, QC Archiving Data storage Data mining Large Scale Experiments and Informatics Data flow
your name Detection points Data Automation: - Library Handling - Robotics - Microscopy - Image processing - Cell reliability - Hit Definition
your name Database architecture (LIMS) MySQL/ PostgreSQL Oracle/DB/SQL Server WEBAPPLICATION DB SCHEME WEB USER INTERFACE Import / Export Read/Write Search Database architecture Command Line Client External Bioinformatics Databases : NCBI, PubMed, SCOPE, PDB, Harvester, …
your name Link to microscope STORAGE
your name Link to microscope STORAGE
your name Link to microscope STORAGE Olympus SCAN R
your name Data archiving Microscope –format BD ThermoFisher - tiff Zeiss - lsm Leica - lei Olympus MD JPEG ~2MB Results Database DB After Screen In Screening Process Tape Convert to jpeg 50 x smaller Image processing Data Management Tool ~100MB + numeric results plots
your name JPEG ~0.5MB Results Database DB After Screen In Screening Process ETH NAS Users Tape 2x25TB LMC (6 weeks) *.tiff ETH NAS User storage Buffer server LMC interface- user + numeric results plots Data Management Tool
your name Dataflow architecture siRNA Inhibit. Library IMAGE SERVER Library DB Screen DB HitBase Published screens Screening process Library checker Data mining Bioinformatics Plates with compounds Screen results Protocol Microscope images Booking system Equipment manager Image analysis Post-processing Pre-processing We need software (Library Handling, QC, Data Mining, Visualization, Export, Flexibility)
your name HC/DC
your name Read data from database
your name Link with image storage
your name Data mining STORAGE LIMS Microscope File system Statistics -One/two parameters -Normalization -Correlation -Compare distribution -Ex. Statistical tests, Z- score, Z’, etc Pattern recognition – Machine Learning -Multi-parameters -Clustering -Dimensionality reduction -Check parameter importance
your name Classification - Unsupervised learning (Cluster analysis, Clustering) seeks to discover the classes ?
your name Classification problem ? - Supervised learning (Classification) assumes classes are known
your name 2 class problem ? Positive controlNegative control
your name Train data for supervised learning 5 people
your name Train data for supervised learning WEKA/R-Project nodes (KNIME) 92% Accuracy Niels Landwehr, Mark Hall, Eibe Frank (2005). Logistic Model Trees.
your name HC/DC Type I 1 or 2 Input(s) and 1 or 2 Output(s) Type II 1 Input (mostly for visualization) Type III 1 Output (mostly data readers) Each node has submenu (right mouse button) Input data Output data Settings: Parameters Rules Algorithms rules Filters Settings
your name HC/DC nodes WEKA/R-Project nodes (KNIME) KNIME nodes HC/DC
your name Next-Generation sequencing nodes Type I 1 or 2 Input(s) and 1 or 2 Output(s) Input data Output data Roche 454
your name Next-Generation sequencing nodes Type I 1 or 2 Input(s) and 1 or 2 Output(s) Input data Output data SOLEXA Illumina
your name Proteomics Type I 1 or 2 Input(s) and 1 or 2 Output(s) Input data Output data
your name Drag-and-drop
your name Pioneering study
your name Plot image parameters/descriptors
your name Compound screening
your name Menu HC/DC
your name Library Handling Data Automation: - Volume - Concentration - Dilutio - Splitting - Take liquid - Barcode
your name Workflows for HCS
your name Workflows for HCS
your name Workflows for HCS
your name Workflows for HCS
your name Link to OME/OMERO
your name Acknowledgements & Partners: Contribution in development: MPI-CBG, Dresden Eberhard Krausz (former) & team Eugenio Fava & team Marc Bickle & team ETH Zurich LMC-RISC: Gabor Csucs & team SystemX Adrian Honegger & team MPI-IB, Berlin Nikolaus Machuy & team
your name HCDC: Workshops - Webinar: pmhttp://hcdc.ethz.ch WebPage Workshops HCDC+ KNIME: days ETH Zurich