Biological Oceanography Scientific Domain Ed DeLong MIT Department of Biological Engineering Department of Civil and Environmental Engineering DataSpace.
Published byModified over 4 years ago
Presentation on theme: "Biological Oceanography Scientific Domain Ed DeLong MIT Department of Biological Engineering Department of Civil and Environmental Engineering DataSpace."— Presentation transcript:
Biological Oceanography Scientific Domain Ed DeLong MIT Department of Biological Engineering Department of Civil and Environmental Engineering DataSpace 1
Coupling of physical & biological oceanographic processes Comparative ecosystem analysis Biodiversity, biomass and productivity C-N-P cycling and energy flow Production, consumption of greenhouse gases: climate Measurement, modeling and experiments with microbial communities in the sea Education, training and knowledge exchange BIOLOGICAL OCEANOGRAPHY
Microbial and sampling scales, based on Dickey (1991) and Allen (2000): Ricardo Letelier Oceanagraphic sampling approaches in the context of scales Scope & Scale : Challenges in Biological Oceanography (Genomes to Biomes…)
ADVANCED INSTRUMENTATION Continuous, autonomous collection of 4D physical, chemical and bio- optical datasets
2 Eddies 1 frontal system Sub-mesocale features? Higher Chla bellow cyclone DCM constant Patchy distribution of small particles Advection/local production of small particles in the Z e
Further specialization: Marine Metagenomics Traditional microbiology and microbial genome sequencing studies rely on cultivated cultures Marine metagenomics: DNA sequences of microbial assemblages from the environment Metagenomic data is used by scientists across multiple disciplines, e.g., Biological engineering & biotechnology Genomics and computation biology Ecology and environmental science Climate: relationship between marine microbes & the ocean’s carbon cycle, productivity, greenhouse gases 6
H179_454DNA_vs_Pelagibacter * * * 25 m 75 m 125 m
2 ND Gen Sequencing Platforms Cost per run~$50<$12K<$5K Bases read/run 72 Kbp100 Mbp 500 Mbp >2 Gbp > 200 Gbp !!! Bases per read 750250 450 >36 (> 100 + Paired end reads) Reads per run 96 reads/run400K reads/run20M reads/run $ per Mbp$ 694 $ 120 $ 7 AB3730 work equivalent -100x AB3730/dy300x AB3730/dy ErrorsDiverse (cloning bias) Homopolymeric runs Diverse (base subn.) Run time1 hour6.5 hours2-14 days* AB3730454 FLX/titan.ILLUMINA
Biological Oceanography Data Challenges Wide variety and heterogeneity of data types Oceanographic cruise data Oceanographic time series data Laboratory & field experiments Remote sensing datasets Data from gliders, AUVs & moorings Genomics, metagenomics, gene expression data Numerical simulations & synthesis products Distributed data (multi-institution & researchers) Need to balance PI, project & public data accessibility Data visualization & analysis needs Long term archiving requirements
Why do biological oceanographers need DataSpace?
DataSpace partners: MIT-OSU Oceanographic Science Partners Ed DeLong (MIT) & Ricardo Letelier (OSU) Library IT Partners MacKenzie Smith (MIT) & Terry Reese (OSU) DeLong and Letelier Co-PIs on three major projects: Center for Microbial Oceanography: Research and Education (C-MORE) Microbial Oceanography of Oxygen Minimum Zones (MOOMZ) Microbial diversity and activity in seasonal hypoxic waters (MI-LOCO)
Existing Data Portal Currently a distributed approach. Consists of weblinks to individually managed heterogeneous datasets. http://cmore.soest.hawaii.edu/data.htm
Biological and Chemical Oceanography Data Management Office database http://osprey.bcodmo.org/index.cfm BCO-DMO Where is the data now ? (Oceanographic data)
Public Databases: NCBI and CAMERA National Center for Biotechnology Information Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis http://camera.calit2.net/ http://www.ncbi.nlm.nih.gov/ Where is the data now? (Genomic/metagenomic) In-house Databases …
Why do biological oceanographers need DataSpace ? Data access, storage, search not centralized Large heterogeneous datasets Complex data management/sharing requirements Shared multiple Institutions & Investigator Long term requirements (2017) Need cross-investigator,institution,project search Currently lots of data is “lost”, e.g. not utilizable
Why do biological oceanographers need DataSpace ? How many autonomous surveys, cruises, mooring datasets, hydrocasts, deckboard experiments had chlorophyll concentrations than X ? Of those data, how many had light levels and oxygen concentrations corresponding Y and Z ? Of those data, how many have corresponding microbial community taxonomic composition and gene content data ? (retrieve) What is the relationship between light, chlorophyll, oxygen and microbial community taxonomic composition and gene content, across all datasets ? How do taxa and gene content relate to oxygen levels and the balance of production and consumption ? Greenhouse (GHG) gas levels ? Are there specific gene proxies that predict oxygen or GHG levels ? Note: centralized data access, search and storage will also drive the way we (sceintists) ask our questions, collect, and annotate our data. = A collaboration between scientists, IT, curators and database managers.
The DataSpace Project & Biological Oceanography Provide infrastructure for digital archiving & preservation at appropriate scales matching scope/complexity of data Enable more integrated intra- & inter-project collaborations, analyses, data encoding, documentation, sharing, visualizing, and preservation Establish standards & best practices to capture, express, encode and publish the policies related to archived data Enable new discoveries by facilitating access, search storage of large, complex heterogeneous datasets
GENOMES BIOMES Community genomic and transcriptomic data Community metabolism Ecosystem functions Community composition and interactions The DataSpace Project & Biological Oceanography