Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK.

Similar presentations


Presentation on theme: "Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK."— Presentation transcript:

1 Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK Matthew T. Dougherty NCMI - Baylor College of Medicine Houston, Texas

2 Bioimage Informatics Informatics in support of biological imaging Why? Image data rapidly increasing (Confocal) Fluorescence microscopy (Cellular B.) EMDB: Electron Microscopy (Structural Biology) High-throughput methods (Genome Biology) Enabling science by making data accessible, reliable, and understandable Standards&Conventions Public Databases Quality assessment Open Microscopy Environment S.Haertel, U. Chile J. Swedlow, U. Dundee EMDB, EBI

3 Structural Databases at EBI Protein Databank (PDB) Atomic structures (positions of atoms) PDB file format, mmCIF Derived from X-ray crystallography Long tradition, curated data base Huge: 65,000+ entries, 3 wwPDB sites Electron Microscopy Databank (EMDB) Part of PDB at EBI and Rutgers 600 density maps of macromolecular structures and subcellular complexes Started 2002 Curated, but limited metadata, experiment info XML-based

4 Page 4 SCIENTIFIC BACKGROUND

5 Page 5 Electron microscope From Schweikert, 2004 Biocenter, U Helsinki

6 Page 6

7 Page 7 Single-particle method Tripeptidyl-peptidase II (TPP II) courtesy of B. Rockel, Martinsried Molecular structure Many images computationally combined 3D from 2D resolution increase by avaraging

8 Page 8 Single-particle analysis: GroEL to 4A Ludtke et al, Structure 2008

9 Page 9 Data Management Issues Initial EM images: O(1000), 4k x 4k -> O(10GPixel) Particle stacks: O(100,000), 256x256 -> O(10 GPixel) Final data set: 1 MVoxel small Processing power: O(100) cores, some weeks, lab-owned clusters Software: 1970s FORTRAN codes, 1990s C codes fragmented communities, lack of standards

10 Page 10 Electron tomography 3D reconstruction by taking a series of images from different angles Difficulty: Nanometer accuracy Problems: Limited tilt range ↔ missing wedge ⇒ distortion Imperfections of the tilt ↔ alignment ⇒ limited resolution Computational reconstruction algorithms

11 Page 11 Tomography of eukaryotic cells PROJECTION SLICE O. Medalia et al, Science, 2002 Dictyostelium discoideum

12 Page 12 Image enhancement Before Cytoskeleton of Spiroplasma melliferum J. Kürner et al., Science, 2005

13 Page 13 Image enhancement yellow: geodetic line J. Kürner et al., Science, 2005 After

14 Page 14 Automated image analysis Manual Automatic A. Linaroudis, Ph.D. Thesis, 2006 Automatic segmentation to identify points/lines/surfaces

15 Page 15 Data Management Issues Original data: 60 images, 8k x 8k -> O(4 GPixel) Reconstruction: 8k x 8k x 256 -> O(16 GPixel) ? Software: 1970s algorithm in 1990s software Visualization: “let's buy more memory” Future: web-based applications (Google Maps) ?

16 The Electron Microscopy Data Bank contains EM-derived density maps complementary to coordinate sets in PDB established 2002 @ EBI (Kim Henrick) web-based submission and retrieval hand-curated (R. Newman) A bit like Ebay – and you won't make any money, either

17 THE ELECTRON MICROSCOPY DATA BANK

18 A Unified Data Resource for EM NIH-funded joint project Baylor College of Medicine, Houston (W. Chiu, M. Baker) Rutgers University, New Jersey [H. Berman, C. Lawson) PDBe, EBI, Cambridge, UK [K. Henrick, C. Best, R. Newman Baylor College of Medicine Houston, TX Rutgers University, Piscataway, NJ European Bioinformatics Institute, Cambridge, UK

19 Characteristics Curated Community Archive: PDB and EMDB NIH, EU (in past), and BBSRC funding (+ EMBL) Worldwide cooperation Advisory boards and task forces from the community Open deposition and retrieval → Alternative access systems by other institutions 760 entries, 26 GB data ca 100 entries/year curation both in Europe and US

20 Growth of EMDB

21 EMDep deposition system 750 entries, current rate approx. 15-20/month Contents of an entry: Metadata (XML header) → experimental metadata Map (any format, converted to CCP4/MRC) Additional files Java/Tomcat/XML

22 Unified data resource plan

23 Joint deposition system

24 EMDB search system Java/Tomcat

25 EMDB search system Java/Tomcat

26 EMDB Atlas pages XSLT

27 ISSUES

28 Metadata management Difficult: many rounds of consulting the community Still most fields remain empty Data harvesting LIMS, PIMS -> rarely used Processing pipelines, image processing software -> Lack of standards, idiosyncrasies Image formats: Appalling lack of standards

29 Data issues Current: Deposit final result of experiment and computation How much of original/intermediate data should be deposited? Issues: Cost / Practicability Reproducibility of experiment Intellectual property (un-exploited results?) Usefulness

30 Non-data issues Embargo: Image data can be withheld up to two years Allows original researcher to further exploit them Journals and funders must define: what data must be deposited when they are to be released Quality Standards: Require community acceptance Technically difficult Data Bank does enrich/annotate, but does not do science → quality standards must be set by scientists

31 Image data formats Current: Variety of historical ad hoc formats Unclear definitions, variations in different software Need: Interoperability Standards Technical level? Acceptance? → Question for the community HDF5 Common container format to deal with numerical data Heavyweight library, but widely available (but Java?) Would at least solve low-level format problems Metadata format still needs to be specified

32 Ontologies Systematic way to define classes of objects attributes of these objects relationships between objects Provides framework for metadata models Advantage: Powerful formal method Disadvantage: Not yet widely used

33 TECHNICAL DEVELOPMENTS

34 Rich data sets Submissions consist of maps (increasingly more than one) relations between data sets → unexpressed XML-based standards for represen-ting relationships between data: Subject-predicate-object relationships (RDF framework) Harvesting interface to EM processing software Web-based visualization for sub-mission and retrieval, complex sub-missions assembled interactively (AJAX)

35 Rich data submissions

36 Possible XML representation

37 Bioimage informatics tools Current EMDB interface: simple and efficient but must be extended to accommodate more complex experiments OMERO interface: geared at labs, not public databases All the beauty of AJAX high-performance visualization

38 multichannel images lab notebook tagging image markup Bioimage informatics tools BISQUE/BISUICK (UCSB)

39 No Standards Experiment? Image? Analytics? Annotations? Current Imaging Workflow Paradigm Jason Swedlow (U. Dundee)

40 Towards Image Informatics

41 OMERO in 2007/8/9 Jason Swedlow (Univ. Dundee)

42 CONCLUSIONS

43 Imaging Centers USERS Databases Grid/cloud computing /storage in house storage storage and computing engines data submission data harvesting acquisition, storage, and management of images storage distribution quality assessment Software A Virtual Research Community

44 CONCLUSIONS Community data bases are a central part of the Scientific Data Infrastructure Image databases rapidly growing Technical challenges: data formats, size Standards and interoperability Improve metadata collection Keep the community engaged


Download ppt "Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK."

Similar presentations


Ads by Google