Computational biology and computational biologists Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular and Molecular Biology.
Published byModified over 5 years ago
Presentation on theme: "Computational biology and computational biologists Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular and Molecular Biology."— Presentation transcript:
Computational biology and computational biologists Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular and Molecular Biology Program in Evolution, Ecology, and Behavior Center for Computational Biology and Bioinformatics
Two computational biologists One computational biologist needs to know a lot of biology Another needs to know a lot of mathematics
Another two computational biologists Craig Benham: mathematics of stressed DNA (understanding regulation) Gene Myers: whole genome sequencing and BLAST
Two different types of computational biologists One works on mathematical or computational problems (derived from biology) that are well posed, and are hard to solve -- these need significant computer science/math/statistics One works on biological problems that are not well posed, and where the computer science/math/statistics needed may be “easier” Both can be problems that are important to biologists, and which they cannot solve without computational biologists’ involvement
Hard math Easy math Easily applicable Not applicable My view of Pasteur’s Quadrant
Hard math Easy math Easily applicable Not applicable My view of Pasteur’s Quadrant What computational scientists want
Hard math Easy math Easily applicable Not applicable My view of Pasteur’s Quadrant What computational scientists want What computational scientists do
Hard math Easy math Easily applicable Not applicable My view of Pasteur’s Quadrant What computational scientists want What computational scientists do What biologists want
Phylogeny From the Tree of the Life Website, University of Arizona Orangutan GorillaChimpanzee Human
DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT TAGCCCATAGACTTAGCGCTTAGCACAAAGGGCAT TAGCCCTAGCACTT AAGACTT TGGACTTAAGGCCT AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT AGCGCTTAGCACAATAGACTTTAGCCCAAGGGCAT
Molecular Systematics TAGCCCATAGACTTTGCACAATGCGCTTAGGGCAT UVWXY U VW X Y
Computational challenges for Assembling the Tree of Life 8 million species for the Tree of Life -- cannot currently analyze more than a few hundred (and even this can take years) We need new methods for inferring large phylogenies - hard optimization problems! We need new software for visualizing large trees We need new database technology Not all phylogenies are trees, so we need methods for inferring phylogenetic networks
Time is a bottleneck for MP and ML Phylogenetic trees MP score Global optimum Local optimum Systematists tend to prefer trees with the optimal maximum parsimony score or optimal maximum likelihood score; however, both problems are hard to solve (Our experimental studies show that polynomial time methods do not do as well as MP or ML heuristics, when trees are big and have high rates of evolution)
MP/ML heuristics Time MP score of best trees Performance of hill-climbing heuristic Fake study
DCM-boosting Speeding up MP/ML heuristics Time MP score of best trees Performance of hill-climbing heuristic Desired Performance Fake study
Characteristics The research can be published in mathematics/statistics/computer science journals and conferences, and evaluated along these lines These people can be faculty in Math/Statistics/Computer Science departments, and *maybe* in some biology departments Substantive improvements are hard, but if achieved will have enormous impact on many biologists Why? These are old problems, endorsed by biologists, of a computational nature.
The “other” type Deals with problems like: protein fold prediction, inferring metabolic or regulatory networks, finding genes within genomes, or even computing a good multiple sequence alignment Needs to know a lot of biology to pose appropriate computational problems Resultant algorithms may not (in some cases) make for interesting or publishable mathematics Note: generally new problems because of new data
What’s needed (for all types) Ability to collaborate with a variety of people, and learn what they want to achieve Ability to be flexible in terms of how one evaluates research results (e.g., real vs. simulated data, theory versus experiment) Ability to communicate research results to different types of researchers Ability to use a variety of techniques to solve biological problems Ability to model and pose appropriate computational approaches for biological problems
Difficult questions What departments should have computational biologists (especially of the second type)? Should there be departments of computational biology? Should there be PhD programs in computational biology? How to evaluate a computational biologist of either type?
Some issues for academic computational biologists Journal versus conference papers, and number of each Experimental/empirical versus theoretical work Software versus papers Authorship order within publications Promotion and Tenure in two departments? Biggest issue: How to predict future success???