Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Annotation and the landscape of the Human Genome Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.

Similar presentations


Presentation on theme: "Genome Annotation and the landscape of the Human Genome Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics."— Presentation transcript:

1 Genome Annotation and the landscape of the Human Genome Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics

2 Genome annotation – Goals protein coding genesRNA genes repetitive elements GC content

3 The starting material AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT AGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGT GCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGT AGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTGCTTGAG TCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTG GGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCT CGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTAT ATCTCTTTCTCTGTCGTGCTGCTTGAGATCGTTCGTTTTTTTATGCT GATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCT AGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGA AGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCT

4 Coding genes – ab initio predictions ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA Open Reading Frame = ORF Stop codon Start codon PolyA signal

5 Ab initio predictions Gene structure

6 Ab initio predictions …AGAATAGGGCGCGTACCTTCCAACGAAGACTGGG… splice donor site splice acceptor site

7 Ab initio predictions Genscan Grail Genie GeneFinder Glimmer etc… EST_genome Sim4 Spidey

8 Homology based predictions ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA ACGGAAGTCT known coding sequence from another organism GGACTATAAA expressed sequence genes predicted by homology Genomescan Twinscan etc…

9 Consolidation – gene prediction systems Otto Ensembl FgenesH Genscan Grail Genewise Sim4 dbEst

10 ncRNA genes prediction based on structure (e.g. tRNAs) for other novel ncRNAs, only homology-based predictions have been successful

11 Repeat annotations Repeat annotation are based on sequence similarity to known repetitive elements in a repeat sequence library

12 The landscape of the human genome

13 Gene annotations – # of coding genes

14 Gene annotations – gene length

15 Gene annotations – gene function

16 GC content and coding potential

17 ncRNAs

18 Segmental duplication

19 Repeat elements

20 Genes and repeats

21 Physical vs. genetic map (MB/cM)

22 Synteny (human-mouse)

23 Gene duplication – paralogs

24 Gene classes across organisms

25 Gene conservation across organisms

26 Human SNPs – polymorphism rate in different regions of given lengths at the scale of the chromosomes

27 Human SNPs – polymorphism rate G+C nucleotide content CpG di-nucleotide content recombination rate functional constraints 3’ UTR5.00 x 10 -4 5’ UTR4.95 x 10 -4 Exon, overall4.20 x 10 -4 Exon, coding3.77 x 10 -4 synonymous 366 / 653 non-synonymous287 / 653

28 Human SNPs – polymorphism rate


Download ppt "Genome Annotation and the landscape of the Human Genome Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics."

Similar presentations


Ads by Google