Your name Dr Karol Kozak ETH Zurich May 2009, Paris OME Large Scale Biological Data Handling and Analysis Using Open Source Alternatives.

Slides:



Advertisements
Similar presentations
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Advertisements

C.-C. Chan Department of Computer Science University of Akron Akron, OH USA 1 UA Faculty Forum 2008 by C.-C. Chan.
CaRE Center Informatics NHLBI CaRE Center Meeting Bethesda, MD July 25, 2006 Marcia Nizzari.
Visual Documentation v User Interface Active class (for selection and some processes)
Data warehouse example
Laboratory Information Management Systems
NYU Microarray Database (NYUMAD)
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
TREMA Tree Management and Mapping software Raintop Computing - Oxford.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Application architectures
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Tunable Machine Vision-Based Strategy for Automated Annotation of Chemical Database ChemReader Jungkap Park, Gus R. Rosania, and Kazuhiro Saitou University.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Visual Documentation v User Interface Active class (for selection and some processes)
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Laboratory Information Management Systems. Laboratory Information The sole product of any laboratory, serving any purpose, in any industry, is information.
Census Data Capture Challenge Intelligent Document Capture Solution UNSD Workshop - Minsk Dec 2008 Amir Angel Director of Government Projects.
Application architectures
High Volume Slide Scanning Architecture and Applications
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
Data Acquisition at the NSLS II Leo Dalesio, (NSLS II control group) Oct 22, 2014 (not 2010)
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
September 2003 Aix en Provence Jonathon Blake EMBL Biochemical Instrumentation.
© What do bioinformaticians do?
Workflow management: motivation and vision
Appendix: The WEKA Data Mining Software
Chapter 1 Introduction to Data Mining
European Plant-to-Enterprise Conference October 27-28, 2009, Utrecht, The Netherlands Mdf MES Development Framework Massimiliano Papaleo.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
T.Jadczyk, Bioinformatics Applications in the Virtual Laboratory Bioinformatics Applications in the Virtual Laboratory Tomasz Jadczyk AGH University of.
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
OME-TIFF and Bio-Formats K. Eliceiri, E. Hathaway, M. Linkert, and C. Rueden
Image Workflow Processes Elspeth Haston, Robert Cubey, Martin Pullan & David J Harris.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
Final Review of ITER PBS 45 CODAC – PART 1 – 14 th, 15 th and 16 th of January CadarachePage 1 FINAL DESIGN REVIEW OF ITER PBS 45 CODAC – PART 1.
Bioinformatics Core Facility Guglielmo Roma January 2011.
Analysis and Management of Microarray Data Previous Workshops –Computer Aided Drug Design –Public Domain Resources in Biology –Application of Computer.
Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida
ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
ITSC/University of Alabama in Huntsville ADaM version 4.0 (Eagle) Tutorial Information Technology and Systems Center University of Alabama in Huntsville.
Genome STRiP ASHG Workshop demo materials
Library Online Resource Analysis (LORA) System Introduction Electronic information resources and databases have become an essential part of library collections.
1 « Luxembourg, 18 April 2007 « Virtual Library of Official Statistics « Dissemination Working Group.
High throughput biology data management and data intensive computing drivers George Michaels.
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Supporting the Computation Needs.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
© Copyright Mistras Group Inc MISTRAS GROUP CONFIDENTIAL Noesis Noesis specializes in Acoustic Emission (AE) data analysis including real-time software.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Image taken from: slideshare
A Database Platform for Bioinformatics
DATA MINING © Prentice Hall.
Data-intensive Computing: Case Study Area 1: Bioinformatics
Waikato Environment for Knowledge Analysis
WEKA.
Lab Automation Software
IMAODBC, The Hague, 5-9 sept 2005
Data Warehousing and Data Mining
Machine Learning with Weka
Control System Studio (CSS)
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Presentation transcript:

your name Dr Karol Kozak ETH Zurich May 2009, Paris OME Large Scale Biological Data Handling and Analysis Using Open Source Alternatives

your name Bioinformatics STORAGE Data acquisition Instrument management Image processing Normalization, QC Archiving Data storage Data mining Large Scale Experiments and Informatics Data flow

your name Detection points Data Automation: - Library Handling - Robotics - Microscopy - Image processing - Cell reliability - Hit Definition

your name Database architecture (LIMS) MySQL/ PostgreSQL Oracle/DB/SQL Server WEBAPPLICATION DB SCHEME WEB USER INTERFACE Import / Export Read/Write Search Database architecture Command Line Client External Bioinformatics Databases : NCBI, PubMed, SCOPE, PDB, Harvester, …

your name Link to microscope STORAGE

your name Link to microscope STORAGE

your name Link to microscope STORAGE Olympus SCAN R

your name Data archiving Microscope –format  BD  ThermoFisher - tiff  Zeiss - lsm  Leica - lei  Olympus  MD JPEG ~2MB Results Database DB After Screen In Screening Process Tape Convert to jpeg 50 x smaller Image processing Data Management Tool ~100MB + numeric results plots

your name JPEG ~0.5MB Results Database DB After Screen In Screening Process ETH NAS Users Tape 2x25TB LMC (6 weeks) *.tiff ETH NAS User storage Buffer server LMC interface- user + numeric results plots Data Management Tool

your name Dataflow architecture siRNA Inhibit. Library IMAGE SERVER Library DB Screen DB HitBase Published screens Screening process Library checker Data mining Bioinformatics Plates with compounds Screen results Protocol Microscope images Booking system Equipment manager Image analysis Post-processing Pre-processing We need software (Library Handling, QC, Data Mining, Visualization, Export, Flexibility)

your name HC/DC

your name Read data from database

your name Link with image storage

your name Data mining STORAGE LIMS Microscope File system Statistics -One/two parameters -Normalization -Correlation -Compare distribution -Ex. Statistical tests, Z- score, Z’, etc Pattern recognition – Machine Learning -Multi-parameters -Clustering -Dimensionality reduction -Check parameter importance

your name Classification - Unsupervised learning (Cluster analysis, Clustering) seeks to discover the classes ?

your name Classification problem ? - Supervised learning (Classification) assumes classes are known

your name 2 class problem ? Positive controlNegative control

your name Train data for supervised learning 5 people

your name Train data for supervised learning WEKA/R-Project nodes (KNIME) 92% Accuracy Niels Landwehr, Mark Hall, Eibe Frank (2005). Logistic Model Trees.

your name HC/DC Type I 1 or 2 Input(s) and 1 or 2 Output(s) Type II 1 Input (mostly for visualization) Type III 1 Output (mostly data readers) Each node has submenu (right mouse button) Input data Output data Settings: Parameters Rules Algorithms rules Filters Settings

your name HC/DC nodes WEKA/R-Project nodes (KNIME) KNIME nodes HC/DC

your name Next-Generation sequencing nodes Type I 1 or 2 Input(s) and 1 or 2 Output(s) Input data Output data Roche 454

your name Next-Generation sequencing nodes Type I 1 or 2 Input(s) and 1 or 2 Output(s) Input data Output data SOLEXA Illumina

your name Proteomics Type I 1 or 2 Input(s) and 1 or 2 Output(s) Input data Output data

your name Drag-and-drop

your name Pioneering study

your name Plot image parameters/descriptors

your name Compound screening

your name Menu HC/DC

your name Library Handling Data Automation: - Volume - Concentration - Dilutio - Splitting - Take liquid - Barcode

your name Workflows for HCS

your name Workflows for HCS

your name Workflows for HCS

your name Workflows for HCS

your name Link to OME/OMERO

your name Acknowledgements & Partners: Contribution in development: MPI-CBG, Dresden Eberhard Krausz (former) & team Eugenio Fava & team Marc Bickle & team ETH Zurich LMC-RISC: Gabor Csucs & team SystemX Adrian Honegger & team MPI-IB, Berlin Nikolaus Machuy & team

your name HCDC: Workshops - Webinar: pmhttp://hcdc.ethz.ch WebPage Workshops HCDC+ KNIME: days ETH Zurich