Protein Modularity and Evolution : An examination of organism complexity via protein domain structure Presented by Jennelle Heyer and Jonathan Ebbers December.

Slides:



Advertisements
Similar presentations
EVIDENCE OF EVOLUTION.
Advertisements

Unit Title Understanding evolutionary relationships (for majors introductory biology) Learning goals 1) Understand how phylogenetic trees are constructed.
Computational biology and computational biologists Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular and Molecular Biology.
BIOINFORMATICS Ency Lee.
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Bioinformatics and Phylogenetic Analysis
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
Alternative splicing and evolution Daniel Jeffares.
Protein Modules An Introduction to Bioinformatics.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Biology and the Tree of Life Chapter One. Key Concepts Organisms obtain and use energy are made up of cells, process information, replicate, and as populations.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
The Science of Life Biology unifies much of natural science
CSE 6406: Bioinformatics Algorithms. Course Outline
Classification and Systematics Tracing phylogeny is one of the main goals of systematics, the study of biological diversity in an evolutionary context.
Figure S1_Yao Qin et al. Figure S1 Occurrence and distribution of trihelix family in different plant species. Red branches in the cladogram indicate that.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
BioPerf: A Benchmark Suite to Evaluate High- Performance Computer Architecture on Bioinformatics Applications David A. Bader, Yue Li Tao Li Vipin Sachdeva.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
ARE THESE ALL BEARS? WHICH ONES ARE MORE CLOSELY RELATED?
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Bioinformatics for Human Biologists Rasmus Wernersson, Associate Professor Center for Biological Sequence Analysis, DTU [ -
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
Using blast to study gene evolution – an example.
Phylogenies Reconstructing the Past. The field of systematics Studies –the mechanisms of evolution evolutionary agents –the process of evolution speciation.
The “ABC’s” of Floral Madness Architecture of a Prototypical Problem Space John Greenler and Doug Green.
Phylogeny & Systematics
I.U. School of Informatics Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
{ Early Earth and the Origin of Life Chapter 15.  The Earth formed 4.6 billion years ago  Earliest evidence for life on Earth  Comes from 3.5 billion-year-old.
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
PHYOGENY & THE Tree of life Represent traits that are either derived or lost due to evolution.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
1 Studying Life. 1 Studying Life 1.1 What Is Biology? 1.2 How Is All Life on Earth Related? 1.3 How Do Biologists Investigate Life? 1.4 How Does Biology.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Protein Evolution Introducing the use of Biology Workbench as a Bioinformatics Tool.
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
Students -LL Ch 22 – 24: Due Monday -Trouble in Paradise: Due Tuesday -Galapagos sent -Phones in bin….muted or off…please & thank you.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
From: Phylogenetic Analysis of the ING Family of PHD Finger Proteins
Pipelines for Computational Analysis (Bioinformatics)
EVIDENCE OF EVOLUTION.
Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia.
Genomes and Their Evolution
There are four levels of structure in proteins
Overview Bioinformatics: Analyzing biological data using statistics, math modeling, and computer science BLAST = Basic Local Alignment Search Tool Input.
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Bioinformatics II
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Functional Impact of Transposable Element using Bioinformatic Analysis
Chapter 4 The Interrupted Gene.
Where would you draw the polyA tail in the gene above?______________
EST Analysis of the Cnidarian Acropora millepora Reveals Extensive Gene Loss and Rapid Sequence Divergence in the Model Invertebrates  R.Daniel Kortschak,
Unit Genomic sequencing
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
GENOMICS Copyright © 2009 Pearson Education, Inc..
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Protein Modularity and Evolution : An examination of organism complexity via protein domain structure Presented by Jennelle Heyer and Jonathan Ebbers December 7, 2004

Presentation Outline Background Material - Protein Evolution, Theory of Domains, Gene Number Hypothesis - Using a model protein family Procedure/Methods - DPIP Program, Phylogenic Analysis Results Discussion/Conclusions

Theories of Protein Evolution A long time ago, in the primodial soup of life, small polypeptides began to form … HDLC or TCP or …. HDLC + TCP = HCLCTCP HCI*CTCP + TCP … Functional proteins HDLC or TCP or …. HDLC + TCP = HCLCTCP HCI*CTCP + QZX … Functional proteins

Concept of Modularity Proteins consist of one or more domains that were pieced together over time Domain  building blocks of proteins –Defined as “ spatially distinct structures that could conceivably fold and function in isolation ” (Pontig and Russell, 2002) –Dictate the function of the protein –Evolutionary pressure to conserve (sequence and/or structure)

Organismal Complexity The nematode, C. elegans, has 19,500 genes in its genome Humans have between 20,000 and 25,000 genes in their genome HOW CAN THAT BE? Alternate splicing, multi-functional/network proteins

Hypothesis Gene products, proteins, can be multi- functional with the introduction of domains “… evolution does not produce innovation from scratch. It works on what already exists, either transforming a system to give it a new function or combining several systems to produce a more complex one ” (Jacob, 1946) More complex or phylogenetically derived organisms produce proteins with greater domain complexity

Hypothesis Part II Create a protein domain “ tool ” –Position –Partner domain –General organization –Protein evolution –Using a variety of sequenced genomes Allow investigators to learn about domain of interest and apply to research

Kinesins: A model protein family Motor proteins found in eukaryotic organisms Contain a conserved motor domain Bind and walk along microtubules Can carry a variety of “ cargo ” May contain multiple domains

Kinesins: A model protein family Arabidopsis thaliana, a model plant species, contains 61 kinesins S. pombe – 10, C. elegans – 22, Drosophlia – 25, Human and mouse ~ 45 From Reddy and Day, 2001

Programming Approach Two programs used, BLAST and InterProScan, held together with perl scripts Give a domain sequence to PSI-BLAST, which will identify proteins that have that domain. One by one, give those protein sequences to IPR, which identifies domains in the protein. Create a listing of proteins and map the data into a phylogeny. Create a tree based on the phylogeny and domains

Domain Sequence List of proteins with similar domains List of domains in every protein Tree (includes domains) BLAST InterProScan Maketree Program Flowchart

Program Details Database selection: –BLAST: Refseq over nr –InterProScan: SMART database, only Threshold values: –BLAST: Option to change, improve resolution –InterProScan: E-value at 0.99, up from 0.01 Used Arabidopsis sequences as a control Name: DPIP (Domain Placement in Proteins)

Results A Quick Look at the Data Phylogenetic Approach –Hypothesis I Qualitative Approach –Hypothesis II

A Quick L k

Phylogenetic Approach “ More complex or phylogenically derived organisms produce proteins with greater domain complexity ” Trace domain characteristics on a preset tree –Use MacClade tree drawing software –Uses input data to create most parsimonious trace Characteristics: Maximum # domains Unique domains

Maximum # of Domains per Protein Green = 1 Black = 3

Number of Unique Domains per Organism Blue = 1 Pink = 2 Dk. Blue = 3 Yellow = 5 Black = 6 Dash - ???

Phylogenetic Conclusions Inconclusive or null hypothesis supported Possible explanations: –Kinesins may have limited domain complexity due to function or folding –Inherent bias in DPIP (refseq database) Future Work: –Testing other domains through same process –Updating database –Include measure for position (N/I/C)

Qualitative Approach Create a protein domain “ tool ” –Position –Partner domain –General organization –Protein evolution –Using a variety of sequenced genomes Compile data into a more informative table

- Can I trace domain or protein evolution??

Presence of FHA/PH domain in kinesins Yellow – Absent Blue - Present

Conclusions DPIP program was created to answer two questions: –Does organismal complexity correspond with protein complexity? –Can we create a tool for researched to better understand domain in protein families? For kinesins motor domains: No and Yes For other domains:???? Thanks to Webb Miller, Richard Cyr Claude DePamphillis, Alexander Richter, Plant Physiology, Biology, and Bioinformatics Depts.