Basics of Comparative Genomics

Slides:



Advertisements
Similar presentations
Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970)
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Phylogenetic reconstruction
Comparative genomics Joachim Bargsten February 2012.
Xenolog: Homologs resulting from horizontal gene transfer.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Some basics: Homology = refers to a structure, behavior, or other character of two taxa that is derived from the same or equivalent feature of a common.
Bioinformatics and Phylogenetic Analysis
Protein Modules An Introduction to Bioinformatics.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
Comparative Genomics of the Eukaryotes
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Protein and RNA Families
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
Genomic and comparative genomic analysis BIO520 BioinformaticsJim Lund.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Using blast to study gene evolution – an example.
Bioinformatics and Computational Biology
Phylogeny & Systematics
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Biomathematics seminar Application of Fourier to Bioinformatics Girolamo Giudice.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Phylogeny and the Tree of Life
Using BLAST to Identify Species from Proteins
Sequence similarity, BLAST alignments & multiple sequence alignments
Evolution of eukaryotic genomes
BLAST program selection guide
Basics of Comparative Genomics
Comparative Genomics.
Pipelines for Computational Analysis (Bioinformatics)
Sequence comparison: Local alignment
Protein Sequence Alignments
Genome Annotation Continued
Eukaryotic Gene Finding
Genome Center of Wisconsin, UW-Madison
There are four levels of structure in proteins
Protein Bioinformatics Course
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Evolution of eukaryote genomes
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Functional Impact of Transposable Element using Bioinformatic Analysis
Chapter 4 The Interrupted Gene.
Phylogeny and Systematics
What do you with a whole genome sequence?
Basic Local Alignment Search Tool
BSC1010: Intro to Biology I K. Maltz Chapter 21.
Pairwise Sequence Alignment
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
The Content of the Genome
Unit Genomic sequencing
Basic Local Alignment Search Tool
Chapter 26 Phylogeny and the Tree of Life
Chapter 20 Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
1 2 Biology Warm Up Day 6 Turn phones in the baskets
Presentation transcript:

Basics of Comparative Genomics Dr G. P. S. Raghava

AIM: To understand Biology of Organisms Importance: More than 100 genomes sequenced, more than 250 in progress Definition: Comparison of set of proteins of one genome to another genome + comparision of gene location, gene order and gene regulation Application Visualization of information on genome Genome annotation (Prediction of gene, repeats, regulation region) Evolutionary information (gene loss, duplication, horizontal gene transfer, ancestor) Essential genes for cell survival Classification of genes based on function Tools and Databases

What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand the uniqueness between different species

Why Comparative Genomics ? It tells us what are common and what are unique between different species at the genome level. Genome comparison may be the surest and most reliable way to identify genes and predict their functions and interactions. – e.g., to distinguish orthologs from paralogs The functions of human genes and other DNA regions can be revealed by studying their counterparts in lower organisms.

What is compared? Gene location Gene structure Gene characteristics Exon number Exon lengths Intron lengths Sequence similarity Gene characteristics Splice sites Codon usage Conserved synteny

Few facts from genome comparision High degree of conservation of microbial proteins (~70% ancestral conserved region) Protein related with ENERGY process are generally found all genomes Proteins related to COMMUNICATION repersent repersent most distinctive function in each genome INFORMATION related protein have complex behaviour High frequence (~10%) non-orthologous gene displacement

Few Terminologies Homology :- Homology is the relationship of any two characters ( such as two proteins that have similar sequences ) that have descended, usually through divergence, from a common ancestral character. Homologues are thus components or characters (such as genes/proteins with similar sequences) that can be attributed to a common ancestor of the two organisms during evolution.

Homologoues can either be orthologues xenologues, paralogues or. Orthologues are homologues that have evolved from a common ancestral gene by speciation. They usually have similar functions. Paralogues are homologues that are related or produced by duplication within a genome followed by subsequent divergence. They often have different functions. Xenologues are homologous that are related by an interspecies (horizontal transfer) of the genetic material for one of the homologues. The functions of the xenologues are quite often similar.

Analogues Analogues are non-homologues genes/proteins that have descended convergently from an unrelated ancestor. They have similar functions although they are unrelated in either sequence or structure.

Frequently used terms Homology Orthologous: Common ancestral gene. They usually have similar functions Paralogous: duplication of gene within genome have usually different functions Xenologous: That are related by an interspecies (horizontal gene transfer) of the genetic material, have similar function Analogous: Not evolve from same ancestor Similarity: sequence similarity Percent Identitity

Visualising Genome Information

Genome Annotation The Process of Adding Biology Information and Predictions to a Sequenced Genome Framework

All-against-all Self-comparison How? Making a database of the proteome Use each protein as a query in a similarity search against the database (BLAST, WU-BLAST or FASTA) Generate a matrix of alignment scores (P or E value) : A conservative cutoff E value : 10e-6 Why? Number of Gene Families This comparison distinguishes unique proteins from proteins arisen from gene duplication, and also reveals the # of gene families. Paralogs Significantly matched pairs of protein sequences may be paralogs.

Between-Proteome Comparisons : Why? To identify orthologs, gene families, and domains Orthologs: (proteins that share a common ancestry & function) A pair of proteins in two organisms that align along most of their lengths with a highly significant alignment score. These proteins perform the core biological functions shared by the two organisms. Two matched sequences (X in A, Y in B) may not be orthologs (Y and Z are paralogs in B, X and Z are orthologs) Identify true orthologs highest-scoring match (best hit) E value < 0.01 > 60% alignment over both proteins

Between-Proteome Comparisons: How? Choose a yeast protein and perform a database similarity search of the worm proteome (WU-BLAST): a yeast-versus-worm search Group the worm seqs that match the yeast query seq with a high P value (10-10 to 10-100), also include the yeast query seq in the group From the group made in 2, choose a worm seq and make a search of the yeast proteome, using the same P limit Add any matching yeast seq to the group made in 2 Repeat 3 & 4 for all initially matched seqs in the group Repeat 1-5 for every yeast protein As 1-6, perform a comparable worm-versus-yeast search Coalesce the groups of related seqs. and remove any redundancies so that every sequence is represented only once. Eliminate any matched pairs in which less than 80% of each seq is in the alignment

Figure 1   Regions of the human and mouse homologous genes: Coding exons (white), noncoding exons (gray}, introns (dark gray), and intergenic regions (black). Corresponding strong (white) and weak (gray) alignment regions of GLASS are shown connected with arrows. Dark lines connecting the alignment regions denote very weak or no alignment. The predicted coding regions of ROSETTA in human, and the corresponding regins in mouse, are shown (white) between the genes and the alignment regions.

Target Validation Target validation involves taking steps to prove that a DNA, RNA, or protein molecule is directly involved in a disease process and is therefore a suitable target for development of a new therapeutic compound. Genes that do not belong to an established family are critical to many disease processes and also need to be validated as potential drug targets.