Presentation on theme: "Bioinformatics at NASA or Yes Virginia, NASA does do biology! Michael New Astrobiology Discipline Scientist Maryland."— Presentation transcript:
Bioinformatics at NASA or Yes Virginia, NASA does do biology! Michael New Astrobiology Discipline Scientist Maryland
Bioinformatics at NASA? Bioinformatics is used at NASA in several ways: Fundamental Space Biology: How do organisms, including humans, adapt to the space environment? Planetary Protection: What is the nature of the community of micro-organisms living in space-craft assembly areas and on space-craft? Astrobiology: What can the genomes of life on Earth tell us about the origin, evolution, distribution and future of life on Earth and the potential for life elsewhere 4 March 2009Bioinformatics Technology Forum2
Fundamental Space Biology How are molecular signals, pathways, and products in humans and model organisms (e.g., mice) altered by exposure to microgravity and space radiation factors? How is drug metabolism affected by space related effects? Are there critical stages in development that are affected by altered gravity? Why virulence of pathogens appears to increase in space? 4 March 2009Bioinformatics Technology Forum3
Small Sats + On-board Expression Measurements + Bioinformatics 4 March 2009Bioinformatics Technology Forum4 30 cm x 10 cm x 10 cm How to make inferences?
Making good inferences is the key 4 March 2009Bioinformatics Technology Forum5 Experimental Data New Knowledge Analysis Algorithm Background knowledge Previous Results Andrew Pohorille, Jeff Shrager and Steve Racunas NASA Center for Astrobioinformatics, Karl Schweighofer
Expression studies are inconclusive 4 March 2009Bioinformatics Technology Forum7 Hypothesis Stanford Medical Sch. Experiments Jnk → c-Junp = 0.51 Jnk ↛ c-Jun p = 0.46 p value: Probability that posterior of H, p(D|H), is just spurious (i.e., same posterior likely with random D when ¬H)
Background knowledge makes a difference! 4 March 2009Bioinformatics Technology Forum8 Hypothesis Stanford Medical Sch. Experiments With Background Knowledge Jnk → c-Junp = 0.51p = 0.77 Jnk ↛ c-Jun p = 0.46p = 0.003
Need a system for evaluating biological models 4 March 2009Bioinformatics Technology Forum9
Planetary Protection What organisms are present in and on spacecraft? How can we assess the “bioburden” of spacecraft? How can we ensure the no Terran life hitchhikes to a clement spot on another planet? How can we assess the safety of returned samples? 4 March 2009Bioinformatics Technology Forum10
Assessing “crud” What is the diversity of low-biomass samples taken from a space-craft assembly clean room ? Comparing two new techniques: Affymetrix’s Phylochip and 454 sequencing. 4 March 2009Bioinformatics Technology Forum11
Third Generation Phylochip 4 March 2009Bioinformatics Technology Forum12 Additional advancements –Smaller feature size -> no increase in chip cost. –Smaller sample volumes: decreased cost in reagents. –Improved analysis More sophisticated fragmentation method Refined analysis software Improved validation approach. Relatively inexpensive and suitable for repeated assays, Less robust quantitation
454 Sequencing: The Sogin Survey Method 4 March 2009Bioinformatics Technology Forum13 In a single run, 454 technology can generate up to 200,000 independent sequence reads of ~100 bases each. Comprehensively samples short variable rRNA regions First report on deep sea diversity estimates 10- 100 times more species than previously suspected (Sogin et al., PNAS 2006). A few species are common, vast majority are rare This method easily adapted to spacecraft bioburden inventory. Gives some estimate of quantity as well as phylogeny Method is expensive and requires large amounts of DNA. More suitable for infrequent assays of pooled samples. 454 Inc
Family-level Comparisons G2 PhyloChip: Families Detected: 96 Detected exclusively on PhyloChip: 31 454 V6 Pyrosequencing: Families Detected: 87 Detected exclusively on PhyloChip: 22 6531 22 Overall both methods showed high agreement of detection at the family level, but only when data from all temperature gradients was compiled.
Astrobiology: Life in a Universal Context How does life begin and evolve? What do the rock record and genomes tell us? Does life exist elsewhere in the Universe? Life as we know it? “Weird” life? How can either be detected? What is the future for life on Earth and beyond? 4 March 2009Bioinformatics Technology Forum15
Three case studies Development of new tool to assess HGT. Peter Gogarten and Olga Zhaxybayeva Use of standard tools to look for independent “leaps to land.” Zoe Cardon, Louise Lewis, and Harry Frank Resurrecting ancient proteins. Steve Benner, et al. 4 March 2009Bioinformatics Technology Forum16
How can we assess the degree of HGT present on the early Earth? 4 March 2009Bioinformatics Technology Forum17 Quartet is a smallest unit of phylogenetic information Each quartet can have three unrooted tree topologies Support for different quartet topologies can be summarized for all gene families Quartet is a smallest unit of phylogenetic information Each quartet can have three unrooted tree topologies Support for different quartet topologies can be summarized for all gene families
Why use embedded quartets? No assumption that all genes in a genome have the same phylogenetic history. The total number of quartets is much smaller than number of tree topologies, which makes it possible to evaluate all quartets. Gene families present only in few analyzed genomes can be included in the analyses Phylogenetic signal can be divided into plurality consensus and the conflicting signal. Allows us to partition analyzed genomes according to some scenario (e.g., grouping by ecology) and retrieve gene families that support or conflict it. 4 March 2009Bioinformatics Technology Forum18
Example: Cyanobacteria & their Genes 4 March 2009Bioinformatics Technology Forum19 Analyzed gene families in 11 sequenced cyanobacterial genomes using the developed quartet decomposition method Cyanobacterial genomes reveal a complex evolutionary history, which cannot be presented by a single strictly bifurcating tree for all genes or even most genes. Across short phylogenetic distances all type of genes appear to be equally affected by transfer. Across large phylogenetic distances genes encoding metabolic functions are more frequently transferred, and genes in transcription and translation are less frequently transferred Olga Zhaxybayeva, J. Peter Gogarten, Robert L. Charlebois, W. Ford Doolittle and R. Thane Papke: "Phylogenetic Analyses Of Cyanobacterial Genomes: Quantification Of Horizontal Gene Transfer Events", Genome Research, 2006, 16:1099-1108.
What traits were needed for “leap to land”? 4 March 2009Bioinformatics Technology Forum20 Chlorophyceae Trebouxiophyceae Ulvophyceae Charophyceae Prasinophyceae Embryophytes 5 Major Green Algal Classes (sensu Mattox and Stewart, 1984--recent revision divides Charophyceae into 6 classes) Terrestrial green plants Green Plants N=1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? N=? leaps of eukaryotic green algae from aquatic or marine habitats to land The famous leap to land Numerous independent habitat transitions provide statistical power for detecting traits correlated with successful leaps from water to land.
Bioinformatics used to: 4 March 2009Bioinformatics Technology Forum21 Infer evolutionary relationships among known aquatic and recently isolated desert algae using data from nucleotide sequences (large data sets, multiple genes) to estimate diversity and describe new species. Estimate the number of transitions from aquatic to terrestrial habitats (Bayesian methods). To date, we estimate at least 40 evolutionarily independent transitions! Test the correlation of source habitat type with traits that occur in our desert and related aquatic algae, using comparative statistical methods that take into account evolutionary relationships among taxa. Lewis and Lewis 2005, Systematic Biology, 54: 936-947; Gray et al. 2007, Plant Cell and Environment, 30:1240-1255; Cardon et al. 2008, Bioscience, 58:114-122; Lewis, unpublished
Moving from single cells to multicellular animals 4 March 2009Bioinformatics Technology Forum22 This seems hard to do from the perspective of molecular biology: Change the goal of life to replicate cells as fast as possible (what bacteria do) to replicating cells under control, and then not at all (what you do) The fossil record makes the transition seem sudden (but the fossil record may be missing many things) We are not certain that the transition is not driven by planetary change, such as the emergence of abundant oxygen in the atmosphere Understanding how this transition took place on Earth helps NASA infer how likely it is to have taken place elsewhere, a key part of the Drake equation to estimate the likelihood of intelligent life elsewhere in the cosmos.
Since fossils are no help, turn to genomes 4 March 2009Bioinformatics Technology Forum23 Exhaustive matching supported models for protein sequence evolution New tools to score amino acid replacements Tools to extend the model that scores replacements Tools to exploit homoplasy, compensatory covariation, other non-Markovian behaviors of in the evolution if real proteins diverging under functional constraints Gonnet, G. H., Cohen, M. A., Benner, S. A (1992) Exhaustive matching of the entire protein sequence database. Science 256, 1443-1445 Multicellularity emerges What happened here in the genome? Sequencing of Choanoflagellate provides outgroup, an animal diverging just before multicellularity emerges King, N. et al. (2008) The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature 451, 783-66
So what happened? 4 March 2009Bioinformatics Technology Forum24 Many things Steroid receptors emerged, together with oxygen-dependent proteins that make steroid hormones; key at many places in metazoan biology Protein tyrosine phosphorylating kinases emerged from serine kinases Protein tyrosine phosphatases emerged (from an unknown source) Kinase substrates emerged that were phosphorylated on tyrosines SH2 domains that bind to phosphortyrosine emerged (unknown source) And not just one example. Lots of them with correlated evolution. JAK is a two domain kinase. The domains are duplicates of a single domain; the duplication occurred in this episode. STAT is a family of substrates for JAK, also arising by duplication at the same time as the JAK domains duplicated. JAK STAT
How do we know that the ancestral proteins were doing phosphorylation, being phosphorylated etc. at that time? 4 March 2009Bioinformatics Technology Forum25 Bring the experimental method to bear on historical hypotheses using biotech to resurrect genes and proteins having the inferred ancestral sequence, studying their behavior in the lab. Consider the SH2 domains, which bind to phosphotyrosine, a new function emerging together with multicellularity. The SH2 domains are a large family having various binding specificities. Resurrection shows that the ancestral proteins bind as well, and shows their specificity. (Benner, et al., unpublished) Binds (Gln or Tyr)-Asn-Tyr) Binds (Ile or Val)-Asn-(Val or Pro)) outgroup
Acknowledgements Andrew Pohorille (NASA ARC) Jeff Schrager (Stanford) Stephen Racunas (Stanford) Karl Schweighofer (SETI Inst) Catharine Conley (NASA ARC) Mitch Sogin (MBL) Kasthuri Venkataswaran (JPL) Gary Andersen (LBL) J. Peter Gogarten (U Conn) Olga Zhaxybayeva (Dalhousie) Zoe Cardon (MBL) Louise Lewis (U Conn) Frank Lewis (U Conn) Steve Benner (FFAME) Jason Raymond (UC Merced) Rob Knight (CUB) Eric Gaucher (GA Tech) 4 March 2009Bioinformatics Technology Forum26
4 March 2009Bioinformatics Technology Forum27 Questions? Comments? Brickbats?