Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Bioinformatics II

Similar presentations


Presentation on theme: "Introduction to Bioinformatics II"— Presentation transcript:

1 Introduction to Bioinformatics II
Lecture 6 By Ms. Shumaila Azam

2 Gene: A sequence of nucleotides coding for protein
Gene Prediction Problem: Determine the beginning and end positions of genes in a genome.

3 Gene Prediction: Computational Challenge
aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcgg Gene!

4 Central Dogma: DNA -> RNA -> Protein
transcription translation CCTGAGCCAACTATTGATGAA CCUGAGCCAACUAUUGAUGAA PEPTIDE

5 Gene Prediction Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced. In computational biology gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. protein-coding genes RNA genes regulatory regions

6 Gene Prediction Statistical analysis of the rates of homologous recombination of several different genes could determine their order on a certain chromosome, and information from many such experiments could be combined to create a genetic map specifying the rough location of known genes relative to each other. Determining that a sequence is functional should be distinguished from determining the function of the gene or its product. in vivo experimentation through gene knockout bioinformatics research are making it increasingly possible to predict the function of a gene based on its sequence alone.

7 Extrinsic approaches In extrinsic (or evidence-based) gene finding systems, the target genome is searched for sequences that are similar to extrinsic evidence in the form of the known sequence of a messenger RNA (mRNA) or protein product. Given an mRNA sequence, it is trivial to derive a unique genomic DNA sequence from which it had to have been transcribed. Given a protein sequence, a family of possible coding DNA sequences can be derived by reverse translation of the genetic code.

8 Extrinsic approaches Once candidate DNA sequences have been determined, it is a relatively straightforward algorithmic problem to efficiently search a target genome for matches, complete or partial, and exact or inexact. BLAST is a widely used system designed for this purpose.

9 Ab initio approaches Ab Initio gene prediction is an intrinsic method based on gene content and signal detection. Because of the inherent expense and difficulty in obtaining extrinsic evidence for many genes, it is also necessary to resort to Ab initio gene finding. genomic DNA sequence alone is systematically searched for certain tell-tale signs of protein-coding genes. These signs can be broadly categorized as either signals, specific sequences that indicate the presence of a gene nearby, or content, statistical properties of protein-coding sequence itself.

10 Ab initio approaches (prokaryotes)
In the genomes of prokaryotes, genes have specific and relatively well-understood promoter sequences (signals). the sequence coding for a protein occurs as one contiguous open reading frame (ORF). one would expect a stop codon approximately every 20–25 codons, or 60–75 base pairs, in a random sequence. These characteristics make prokaryotic gene finding relatively straightforward, and well-designed systems are able to achieve high levels of accuracy.

11 Open Reading Frame Finder (Input)

12 Output

13 Ab initio approaches (Eukaryotes)
Ab initio gene finding in eukaryotes, especially complex organisms like humans, is considerably more challenging. First: the promoter and other regulatory signals in these genomes are more complex and less well-understood. Two classic examples of signals identified by eukaryotic gene finders are CpG islands and binding sites for a poly(A) tail. Second: splicing mechanisms

14 Combined approaches combine extrinsic and ab initio approaches by mapping protein and EST data to the genome to validate ab initio predictions.

15 Comparative genomics approaches
As the entire genomes of many different species are sequenced, a promising direction in current research on gene finding is a comparative genomics approach. This is based on the principle that the forces of natural selection cause genes and other functional elements to undergo mutation at a slower rate than the rest of the genome. Genes can thus be detected by comparing the genomes of related species. This approach was first applied to the mouse and human genomes

16 GeneMarkS


Download ppt "Introduction to Bioinformatics II"

Similar presentations


Ads by Google