Presentation is loading. Please wait.

Presentation is loading. Please wait.

The data flood: We need a bigger boat James A. Foster The Initiative for Bioinformatics and Evolutionary Studies (IBEST) Biological Sciences, Bioinformatics.

Similar presentations


Presentation on theme: "The data flood: We need a bigger boat James A. Foster The Initiative for Bioinformatics and Evolutionary Studies (IBEST) Biological Sciences, Bioinformatics."— Presentation transcript:

1 The data flood: We need a bigger boat James A. Foster The Initiative for Bioinformatics and Evolutionary Studies (IBEST) Biological Sciences, Bioinformatics and Computational Biology University of Idaho

2 2 JAF INBRE Data Flood 8/4/09 Outline ✦ Where is this flood of data coming from? ✦ What kind of tool is appropriate for this amount of data? ✦ What kind of a tool is “bioinformatics”? ✦ How about an example?

3 3 JAF INBRE Data Flood 8/4/09 DNA sequencing data flood Yearbp/day 19777.35 198650-ish 199519,000 1998400,000 2008 1,600,000,000 2009 3,200,000,000 2012?? ABI 3700 454 454/FLX ??? Technology ABI 370 ABI 377 Gels

4 4 JAF INBRE Data Flood 8/4/09 The data flood: DNA example Yearbp/dayNotes 19777.35Manual: φx174 198650-ishGel: ABI 370 199519,000Gel: ABI 377 1998400,000Cap: ABI 3700 200816000000 00 454 200932000000 00 454/FLX 2012?? Water 1L Barrel (176 gallons) Big pool (2x6x12m) football field, 20m deep Lakes Michigan/Huron all Great Lakes (nearly) ocean?

5 5 JAF INBRE Data Flood 8/4/09 Bioinformatics tools YearData volume 19771L 1986barrel 1995big pool 1998football field 2008Lake Michigan 2009Great Lakes 2012ocean? Technology hose pfd Kayak Orca? bigger boat? Glomar? spoon

6 6 JAF INBRE Data Flood 8/4/09 Bioinformatics: bigger boat? Your thesis Data The Computer (bioinformatics) Hypo You Your hypothesis

7 7 JAF INBRE Data Flood 8/4/09 Reflection on the metaphor ✦ At some point, you can use fundamentally different techniques: spoons versus boats ✦ At some point, you can test fundamentally new hypotheses: not “we need a smaller shark” ✦ Sometimes the old technology is still good: the kayak was appropriate in this picture ✦ The new technology may be for a different purpose: fishing versus deep sea exploration

8 8 JAF INBRE Data Flood 8/4/09 ✦T✦Technology quiz!

9 9 What does this do?

10 10 JAF INBRE Data Flood 8/4/09 What does this do? Not that! THIS! A Bigger Boat Whateve r you tell it to do!

11 11 JAF INBRE Data Flood 8/4/09 What is Bioinformatics? ✦ Bioinformatics is what you tell the computer to do with your data

12 12 JAF INBRE Data Flood 8/4/09 Of Boats and Bioinformatics Bioinformatics is what you do with the boat you are in during the data flood You might be able to do more with a bigger boat

13 13 JAF INBRE Data Flood 8/4/09 Sampling emergent diversity ✦ Get ALL DNA along a age-variant transect 10 samples per site time since exposure: 5y, 19y, 40y, 63y, 100y, and 150y “chronoclines” sample ecosystems by age ✦ Who’s there? ✦ How does ecosystem change over time?

14 14 JAF INBRE Data Flood 8/4/09 Bioinformatics problems ✦ Estimate α diversity: number of “species” in each sample and age group ✦ Estimate β diversity: amount of variation in “species” between age groups ✦ Determine which species (no quotes) are present in each sample (not part of this talk) Biological questions: How do soil bacterial respond to retreating glaciers? How do microbial soil communities change?

15 15 JAF INBRE Data Flood 8/4/09 Lots of data (post QC) AgeSamplesSequencesDNA Mbp 5y935,0928.77 19y1041,49410.37 40y833,6658.42 63y941,76710.44 100y841,17810.29 150y840,21010.05 Total52233,40658.35 Note: A SMALL run, max is 37GB/8hr run max, 1.6 Bbp/day

16 16 JAF INBRE Data Flood 8/4/09 Bioinformatics objectives determine species cluster by species cluster by age Explain data in terms of biological processes and age (tell a story) Too much data: 233K sequences!

17 17 JAF INBRE Data Flood 8/4/09 Trick: Turn it upside down Cluster each of 52 samples (approx. 6k each), choose a proxy sequence Cluster proxies by age (approx. 40k each) Cluster combined sequences to get species (quantify richness) Build +/- matrix +++ - +-- +-+ -++++

18 18 JAF INBRE Data Flood 8/4/09 Bioinformatics challenges ✦ Move data between computers (IGS, laptop, IBEST Core) ✦ File the data in a retrievable way ✦ Associate metadata with data ✦ Cluster sequences within/between samples ✦ Associate clusters with species ✦ Compute diversity statistics ✦ Prepare publications and talks ✦ (much more)

19 19 JAF INBRE Data Flood 8/4/09 Conclusions ✦ Biology There are thousands of species of bacteria in arctic soil Number of bacterial species increases as time of post-glacial exposure increase ✦ Algorithmics (want a job?) “Quantity has a quality all it’s own” (V.I.Lenin) Need new algorithms to use new hardware Database/dataset management is crucial

20 20 JAF INBRE Data Flood 8/4/09 Thanks! ✦ Ursel Schüette ✦ Zaid Abdo ✦ Jacob Pierson ✦ Larry Forney ✦ Rob Lyon ✦ The Forney-Top lab ✦ John Bunge, Cornell ✦ The Relational Database project, MSU ✦ to INBRE for the excuse ✦ to IBEST for the science ✦ to NIH, NSF, and UI for the money ( P20RR16448, P20RR016454, EPS080935)

21 21 JAF INBRE Data Flood 8/4/09 ✦ Discussion?

22 22 JAF INBRE Data Flood 8/4/09 Extra stuff Intentionally blank

23 23 JAF INBRE Data Flood 8/4/09 Roche 454: a genome a day

24 24 JAF INBRE Data Flood 8/4/09 Metagenomics ✦ Harvest approximately first 300bp of every 16s rRNA molecule, all samples Ribosome: required to translate DNA (conserved) Common marker for microbial species ✦ Cluster by evolutionary relationships (“species”) ✦ Analyze by chronocline

25 25 JAF INBRE Data Flood 8/4/09 Future work: same tune, new lyrics ✦ Data from human microbiome How do microbial communities vary between healthy and sick people? ✦ Data from polluted soil (Yangtzee river, PRC) How do microbial communities vary as pollution increases? ✦ Data from longitudinal transects How does microbial diversity change with latitude?


Download ppt "The data flood: We need a bigger boat James A. Foster The Initiative for Bioinformatics and Evolutionary Studies (IBEST) Biological Sciences, Bioinformatics."

Similar presentations


Ads by Google