The data flood: We need a bigger boat James A. Foster The Initiative for Bioinformatics and Evolutionary Studies (IBEST) Biological Sciences, Bioinformatics.

Slides:



Advertisements
Similar presentations
1.1.3 MI.
Advertisements

An Introduction to “Bioinformatics to Predict Bacterial Phenotypes” Jerry H. Kavouras, Ph.D. Lewis University Romeoville, IL.
Approaches for our growing metagenomes Kostas Konstantinidis Carlton S. Wilder Associate Professor School of Civil and Environmental Engineering & School.
CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services.
Interactions of Living Things
Water Systems Review Assignment
Determination of host-associated bacterial communities In the rhizospheres of maize, acorn squash, and pinto beans.
Yaron Fireizen, Vinay Rao, Lacy Loos, Nathan Butler, Dr. Julie Anderson, Dr. Evan Weiher ▪ Biology Department ▪ University of Wisconsin-Eau Claire From.
Transcriptomics Breakout. Topics Discussed Transcriptomics Applications and Challenges For Each Systems Biology Project –Host and Pathogen Bacteria Viruses.
Jenifer Unruh VCU-HHMI Summer Scholars Program Mentor: Dr. Shozo Ozaki.
BIOINFORMATICS Ency Lee.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Bioinformatics and Phylogenetic Analysis
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Microbial Diversity.
Comparative Genomics Bio Informatics Scott Gulledge.
17 years of IBEST (©2010, james a. foster) 17 years of sustained high performance interdisciplinarity (IBEST) James A. Foster University of Idaho
What do you see in this image? How was this image taken?
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
Quantifying Sample DNA. Definition Quantifying DNA: a technique to calculate the quantity (weight) of DNA (deoxyribonucleic acid) in a sample. Using a.
Introduction to metagenomics Agnieszka S. Juncker Center for Biological Sequence Analysis Technical University of Denmark.
Ecology Introduction to Ecology. Why are you here? Teaching children about the natural world should be treated as one of the most important events in.
The BIO Directorate Microbial Biology Emphasis BIO Advisory Committee April, 2005.
Molecular Microbial Ecology
Todd J. Treangen, Steven L. Salzberg
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/ Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.
Water: Its Important! Mr. Ellis. Do Now Why do you think water is important? What does it do for you? Why do you think water is important? What does it.
Understanding How to Save and Spend Your Money A WebQuest for 4th Grade Social Studies Designed by: Nick Rancani
Wonderful Water. 1. How much salt water is there on earth? 50%70%
Biology Unit - Ecology 4.1 Notes.
The NIH Roadmap and the Human Microbiome Project Francis S. Collins, M.D., Ph.D. National Human Genome Research Institute April 22, 2007.
By Hannah McFarlin. Introduction Marine biologists are trained experts in marine life and use a variety of tools to advance our knowledge of marine life.
Molecular Techniques in Microbiology These include 9 techniques (1) Standard polymerase chain reaction Kary Mullis invented the PCR in 1983 (USA)Kary.
Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.
ARE THESE ALL BEARS? WHICH ONES ARE MORE CLOSELY RELATED?
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
DNA sequencing, big data and health Mikael Huss Science for Life Laboratory / Stockholm Follow the Data blog:
Diversity and Functional Variation of Denitrifying Bacterial Communities in the Cape Fear River Estuary Brian Shirey Marine Biology.
Introduction to Biology Estimated 5-30 million species Only 2 million have been identified Only a few thousand have been studied Believed that life arose.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Genomics and Forensics
PHAGE PRESENTATION Emilee Plautz. INTRODUCTION...
Bioinformatics Curriculum Issues, goals, curriculum.
Title 31: Title of Lesson Date: White space: 1. Where does drinking water come from? 2. How might it become polluted or contaminated? Summary of Introduction:
Biotechnology Technology is essential to science for such purposes as sample collection and treatment, measurement, data collection and storage, computation,
Environmental Science 3205 unit one Chapter 3. ecosystems We have seen how food webs along with their cycles of energy and nutrients make up an ecosystem.
L ESSON A IMS & O BJECTIVES Two part lab: First part will be completed in class today. (1) Use the online Bioinformatics tool ClustalW to analyze DNA sequences.
Great Lakes Video GEOGRAPHYHISTORY ETC. ECONOMY Geography History Etc. Economy.
Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness Patric D. Schloss and Jo Handelsman Department.
State Standards Biotechnology. Understand how biotechnology is used to affect living organisms. Summarize aspects of biotechnology including: Specific.
Birds and Beaks Scenario Review. 11. What caused the increase in the average beak size of the finch population after the 1977 drought? D – Finches with.
Microbial Community Analysis in Monosodium-methanearsonate Treated Rice Soils Anil Somenahally 1, Terry Gentry 1, Richard Loeppert 1 and Wengui Yan 2 1.
16S rRNA Experimental Design
Metagenomic Species Diversity.
Introduction to Bioinformatics Resources for DNA Barcoding
Seminar in Bioinformatics (236818)
Water Transportation.
Structure of the Hydrosphere
VISUALIZING COMPLEX BACTERIAL POPULATIONS IN ANIMAL MODELS
Overview Bioinformatics: Analyzing biological data using statistics, math modeling, and computer science BLAST = Basic Local Alignment Search Tool Input.
H = -Σpi log2 pi.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Metagenomics Microbial community DNA extraction
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
1.1.3 MI.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

The data flood: We need a bigger boat James A. Foster The Initiative for Bioinformatics and Evolutionary Studies (IBEST) Biological Sciences, Bioinformatics and Computational Biology University of Idaho

2 JAF INBRE Data Flood 8/4/09 Outline ✦ Where is this flood of data coming from? ✦ What kind of tool is appropriate for this amount of data? ✦ What kind of a tool is “bioinformatics”? ✦ How about an example?

3 JAF INBRE Data Flood 8/4/09 DNA sequencing data flood Yearbp/day ish , , ,600,000, ,200,000, ?? ABI /FLX ??? Technology ABI 370 ABI 377 Gels

4 JAF INBRE Data Flood 8/4/09 The data flood: DNA example Yearbp/dayNotes Manual: φx ishGel: ABI ,000Gel: ABI ,000Cap: ABI /FLX 2012?? Water 1L Barrel (176 gallons) Big pool (2x6x12m) football field, 20m deep Lakes Michigan/Huron all Great Lakes (nearly) ocean?

5 JAF INBRE Data Flood 8/4/09 Bioinformatics tools YearData volume 19771L 1986barrel 1995big pool 1998football field 2008Lake Michigan 2009Great Lakes 2012ocean? Technology hose pfd Kayak Orca? bigger boat? Glomar? spoon

6 JAF INBRE Data Flood 8/4/09 Bioinformatics: bigger boat? Your thesis Data The Computer (bioinformatics) Hypo You Your hypothesis

7 JAF INBRE Data Flood 8/4/09 Reflection on the metaphor ✦ At some point, you can use fundamentally different techniques: spoons versus boats ✦ At some point, you can test fundamentally new hypotheses: not “we need a smaller shark” ✦ Sometimes the old technology is still good: the kayak was appropriate in this picture ✦ The new technology may be for a different purpose: fishing versus deep sea exploration

8 JAF INBRE Data Flood 8/4/09 ✦T✦Technology quiz!

9 What does this do?

10 JAF INBRE Data Flood 8/4/09 What does this do? Not that! THIS! A Bigger Boat Whateve r you tell it to do!

11 JAF INBRE Data Flood 8/4/09 What is Bioinformatics? ✦ Bioinformatics is what you tell the computer to do with your data

12 JAF INBRE Data Flood 8/4/09 Of Boats and Bioinformatics Bioinformatics is what you do with the boat you are in during the data flood You might be able to do more with a bigger boat

13 JAF INBRE Data Flood 8/4/09 Sampling emergent diversity ✦ Get ALL DNA along a age-variant transect 10 samples per site time since exposure: 5y, 19y, 40y, 63y, 100y, and 150y “chronoclines” sample ecosystems by age ✦ Who’s there? ✦ How does ecosystem change over time?

14 JAF INBRE Data Flood 8/4/09 Bioinformatics problems ✦ Estimate α diversity: number of “species” in each sample and age group ✦ Estimate β diversity: amount of variation in “species” between age groups ✦ Determine which species (no quotes) are present in each sample (not part of this talk) Biological questions: How do soil bacterial respond to retreating glaciers? How do microbial soil communities change?

15 JAF INBRE Data Flood 8/4/09 Lots of data (post QC) AgeSamplesSequencesDNA Mbp 5y935, y1041, y833, y941, y841, y840, Total52233, Note: A SMALL run, max is 37GB/8hr run max, 1.6 Bbp/day

16 JAF INBRE Data Flood 8/4/09 Bioinformatics objectives determine species cluster by species cluster by age Explain data in terms of biological processes and age (tell a story) Too much data: 233K sequences!

17 JAF INBRE Data Flood 8/4/09 Trick: Turn it upside down Cluster each of 52 samples (approx. 6k each), choose a proxy sequence Cluster proxies by age (approx. 40k each) Cluster combined sequences to get species (quantify richness) Build +/- matrix

18 JAF INBRE Data Flood 8/4/09 Bioinformatics challenges ✦ Move data between computers (IGS, laptop, IBEST Core) ✦ File the data in a retrievable way ✦ Associate metadata with data ✦ Cluster sequences within/between samples ✦ Associate clusters with species ✦ Compute diversity statistics ✦ Prepare publications and talks ✦ (much more)

19 JAF INBRE Data Flood 8/4/09 Conclusions ✦ Biology There are thousands of species of bacteria in arctic soil Number of bacterial species increases as time of post-glacial exposure increase ✦ Algorithmics (want a job?) “Quantity has a quality all it’s own” (V.I.Lenin) Need new algorithms to use new hardware Database/dataset management is crucial

20 JAF INBRE Data Flood 8/4/09 Thanks! ✦ Ursel Schüette ✦ Zaid Abdo ✦ Jacob Pierson ✦ Larry Forney ✦ Rob Lyon ✦ The Forney-Top lab ✦ John Bunge, Cornell ✦ The Relational Database project, MSU ✦ to INBRE for the excuse ✦ to IBEST for the science ✦ to NIH, NSF, and UI for the money ( P20RR16448, P20RR016454, EPS080935)

21 JAF INBRE Data Flood 8/4/09 ✦ Discussion?

22 JAF INBRE Data Flood 8/4/09 Extra stuff Intentionally blank

23 JAF INBRE Data Flood 8/4/09 Roche 454: a genome a day

24 JAF INBRE Data Flood 8/4/09 Metagenomics ✦ Harvest approximately first 300bp of every 16s rRNA molecule, all samples Ribosome: required to translate DNA (conserved) Common marker for microbial species ✦ Cluster by evolutionary relationships (“species”) ✦ Analyze by chronocline

25 JAF INBRE Data Flood 8/4/09 Future work: same tune, new lyrics ✦ Data from human microbiome How do microbial communities vary between healthy and sick people? ✦ Data from polluted soil (Yangtzee river, PRC) How do microbial communities vary as pollution increases? ✦ Data from longitudinal transects How does microbial diversity change with latitude?