Hillary Term 04: “The Human Genome” 20.1 The Human Genome – evolutionary issues (Hein) 27.1 Non-Genic Selection in the Human Genome (Lunter) 3.2 Mammalian.

Slides:



Advertisements
Similar presentations
1 Radio Maria World. 2 Postazioni Transmitter locations.
Advertisements

Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
/ /17 32/ / /
Reflection nurulquran.com.
1
EuroCondens SGB E.
Worksheets.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Molecular Biology Fifth Edition
Addition and Subtraction Equations
1 Changing Profile of Household Sector Credit and Deposits in Indian Banking System -Deepak Mathur November 30, 2010.
David Burdett May 11, 2004 Package Binding for WS CDL.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
Summative Math Test Algebra (28%) Geometry (29%)
The Human Genome Project Main reference: Nature (2001) 409,
Genome Projects A genome project is the complete DNA sequence of the genome of an organism, and the identification of all its genes Genome projects are.
Andrew Meade School of Biological Sciences.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Break Time Remaining 10:00.
The basics for simulations
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
The genetic dissection of complex traits
Janice S. Dorman, PhD University of Pittsburgh School of Nursing
1 Lincolnshire Research Observatory Lincolnshire’s Changing Population Components of Change and the Demographic Impact Eleanor.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Artificial Intelligence
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Daily Quiz and Journal Ch 1 Sect 1
UTACCEL 2010 Adventures in Biotechnology Graham Cromar.
Before Between After.
: 3 00.
5 minutes.
Analyzing Genes and Genomes
Static Equilibrium; Elasticity and Fracture
Essential Cell Biology
ANSC644 Bioinformatics-Database Mining 1 ANSC644 Bioinformatics §Carl J. Schmidt §051 Townsend Hall §
Resistência dos Materiais, 5ª ed.
PSSA Preparation.
Chapter 13 Web Page Design Studio
Energy Generation in Mitochondria and Chlorplasts
9. Two Functions of Two Random Variables
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) The Parsimony criterion GKN Stochastic Models of.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Simon Myers, Gil McVean Department of Statistics, Oxford Recombination and genetic variation – models and inference.
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
What is Bioinformatics?
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Relationship between Genotype and Phenotype
Schedule The Parsimony criterion GKN 13.10
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Genomes and Their Evolution
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
The Content of the Genome
Relationship between Genotype and Phenotype
Presentation transcript:

Hillary Term 04: “The Human Genome” 20.1 The Human Genome – evolutionary issues (Hein) 27.1 Non-Genic Selection in the Human Genome (Lunter) 3.2 Mammalian Genes I: Conservation and slow evolution (Ponting) 10.2 Mammalian Genes II: Functional innovation and rapid change (Ponting/Goodstadt) 17.2 RNAs in Human Genome (Sam Griffiths-Jones) 24.2 Population Genetics of the Human Genome (Gil McVean ) 2.3 Association Mapping and the Human Genome (Lon Cardon) 9.3 The Human Genome and Human Evolution (Chris Tyler-Smith)

The Human Genome – key issues The Human Genome Project Few basic facts of the human genome Grammar of Genes Basic events happening to a genome per mitosis/generation Genealogical Structures: Phylogenies, Pedigrees and the ARG Long term Dynamics of the Human Genome: The comparative aspect (Genotype  Phenotype) & (Population Genetics/History) => Gene Mapping History Our interests.

History of the Human Genome Project Strachan and Read, HMG3 p Physical map. 24 types and total set of 46 chromosomes 1977 Sanger publishes dideoxy sequencing method 1980 Botstein proposes human genetic map using RFLPs 1987 US DOE publishes report discussing HGP 1988 HUGO is established 1990 Official start of HGP with 3 billion $ and a 15 year horizon Genome Database GB is established 1992 Genethon publishes map based on microsatelites Lander et al. detailed map based on sequence tagged sites Comprehensive map based on gene markers Sanger Centre publishes chromosome Draft Genome published: Celera & Public 2003 Completion (almost) of Human Genome

Public effort- strategy: Celera - strategy: From Myers 99 Sequencing Strategies Celera’s view of International Consortium International Consortium’s view of Celera Unfair competition: IC delivering the same goods but with state funding. Unfair competition: Celera delivering the same goods but can use IC data, while IC cannot use Celera data.

Other Genome Projects 1976/79 First viral genome – MS2/fX Mitochondrion 1982 First shotgun sequenced genome – Bacteriophage lambda 1995 First prokaryotic genome – H. influenzae 1996 First unicellular eukaryotic genome – Yeast 1998 The first multicellular eukaryotic genome – C.elegans 2000 Drosophila melanogaster 2000 Arabidopsis thaliana 2001 Human Genome 2002 Mouse Genome The Genome OnLine Database knows of 958 genome sequencing projects, of which 169 are completed

Favourite and Model Organisms Multicellular Animals Mammals Human 3.5 Gb Mouse 3.2 Gb Cow 3.0 Gb Dog 2.8 Gb Rat 3.1 Gb Chimp 3.5 Gb Pig 3.0 Gb Fish Puffer Fish 0.4 Gb Zebra Fish 1.9 Gb Insects Drosophila 165 Mb Honey Bee 270 Mb Yellow Fever Mosquito 780 Mb Malaria Mosquito 278 Mb Strachan and Read (2004) Chapter 8 Birds Chicken 1.2 Gb Frog Xenopus Laevis 1.7 Gb Nematodes Caenorhabdites elegans 100 Mb Caenorhabdites briggsae 80 Mb Sea Urchin Strongylocentrotus purpuratus 800 Mb Multicellular Plants Arabidopsis thaliana 125 Mb Rice 430 Mb

 globin Exon 2 Exon 1 Exon 3 5’ flanking 3’ flanking (chromosome 11) The Human Genome I & R.Harding & HMG (2004) p 245 *5.000 *20 6*10 4 bp 3.2*10 9 bp *10 3 3*10 3 bp ATTGCCATGTCGATAATTGGACTATTTGGA30 bp Myoglobin  globin aa DNA: Protein: X Y mitochondria.016

The Human Genome II Strachan and Read (2004) Chapter 9 Nuclear Genome Mitochondria Highly conserved - coding 1.5% 93% Highly conserved - other 3.5% 5% Transposon based repeats 45 % - Heterochromatin 6.6% - Other non-conserved 44 % 2% Mendelian inheritance Maternal inheritance 1 (typically) Possibly thousands Recombination No recombination Gene Density: 1/130 kb 2 kb Pseudogenes: Processed Pseudogenes

The Human Genome III Strachan and Read (2004) Chapter 9 + Lander et al.(2001) Gene families Clustered  -globins (7), growth hormone (5), Class I HLA heavy chain (20),…. Dispersed Pyruvate dehydrogenase (2), Aldolase (5), PAX (>12),.. Clustered and Dispersed HOX (38 – 4), Histones (61 – 2), Olfactory receptors (>900 – 25),… Transposons

Genes and Gene Structures I Presently estimated Gene Number: (reference: ) Average Gene Size: 27 kb The largest gene: Dystrophin 2.4 Mb - 0.6% coding – 16 hours to transcribe. The shortest gene: tRNA TYR 100% coding Largest exon: ApoB exon 26 is 7.6 kb Smallest: <10bp Average exon number: 9 Largest exon number: Titin 363 Smallest: 1 Largest intron: WWOX intron 8 is 800 kb Smallest: 10s of bp Largest polypeptide: Titin smallest: tens – small hormones. Intronless Genes: mitochondrial genes, many RNA genes, Interferons, Histones,.. Jobling, Hurles & Tyler-Smith (2004) HEG p 29 + HMG chapt. 9

Genes and Gene Structures II Genes within Genes: Intron 26 of neurofibromatosis type I (NF1) contains 3 internal (2 exons) genes in the opposite direction. Overlapping Genes: Class III region of HLA Strachan and Read (2004) Chapter 9 p 258 Simple Eukaryotic

Alternative Splicing Cartegni,L. et al.(2002) “Listening to Silence and understanding nonsense: Exonic mutations that affect splicing” Nature Reviews Genetics HMG p A challenge to automated annotation. 2.How widespread is it? 3.Is it always functional? 4.How does it evolve?

RNAs in the Genome Strachan and Read (2004) p.247 F9.4 ~200 snoRNA small nucleolar, over 100 types - RNA modification and processing ~100 snRNA small nuclear - involved in splicing ~200 miRNA very small ~22bp, regulation ~175 28S,5.8S,5S large cytosolic subunit ~175 18S small mitochondrial subunit ~250 5S large mitochondrial subunit >500 tRNA transfer RNA >1500 Antisense RNA > 1500 types

Genome Annotation Ensembl Santa Cruz Genome Browser Genomes Proteins ESTs

Gene Finding and Protein (HMM) Descriptors Burge & Karlin jmb 96 A.Make gene characteristics to each nucleotide. Extract legal prediction by dynamical programming. B. Use HMM to describe biological knowledge of gene structure.

Mutations and Mutation Rates 1 mitosis or generation Average Number of Mitoses Male generation (15: :150 Female generation: ~24 Crow,JF (2000) “The Origins, Patterns and Implications of Human Spontaneous Mutation” Nature Review Genetics Strachan and Read (2004) chapter 11 +Jobling, Hurles and Tyler- Smith (2004) chapter 2 Single nucleotide substitutions: ~10 -7 Microsatellites (~ ): ~10 -2 Small insertion deletions: ~10 -8

Recombination 1 meiosis Lander et al.(2001) “Initial sequencing and analysis of the human genome” Nature Kong,E. et al.(2002) “A high resolution recombination map of the human genome” Nature Genetics Recombination: Gene Conversion: Total Haploid length males: 25.9 M - females: 44.6 M. Gene conversions 1-2 orders higher. Length pb.

Selection: Positive & Negative A A A A A A One sequence scenarioPopulation scenario A A A C C A A A C C A A A C C ThrSer ACGTCA Pro ThrPro ACGCCA ThrSer ACGCCG ArgSer AGGCCG ThrSer ACTCTG AlaSer GCTCTG AlaSer GCACTG - - One sequence scenario again Certain events have functional consequences and will be selected out. The strength and localization of this selection is of great interest. The selection criteria could in principle be anything, but the selection against amino acid changes is without comparison the most important.

The Genetic Code Substitutions Number Percent Total in all codons Synonymous Nonsynonymous Missense Nonsense 23 4

Examples of rates remade from Li,1997 RNA Virus Influenza A Hemagglutinin Hepatitis C E HIV 1 gag DNA virus Hepatitis B P Herpes Simplex Genome Nuclear Genes Mammals c-mos Mammals a-globin Mammals histone Organism Gene Syno/year Non-Syno/Year

Genealogical Structures Homology : The existence of a common ancestor (for instance for 2 sequences) Phylogeny Pedigree: Ancestral Recombination Graph – the ARG ccagtcg ccggtcg cagtct Only finding common ancestors. Only one ancestor. i. Finding common ancestors. ii. A sequence encounters Recombinations iii. A “point” ARG is a phylogeny

Populations Now Parents Grand parents

Genealogical approach to Population Variation Analysis Africa Non-Africa Inter.SNP Consortium (2001): A map of human genome sequence variation containing 1.42 million SNPs. Nature

Pedigrees Icelandic + Helgason, A. et al. (2003 June) “A population-wide coalescent analysis of Icelandic matrilineal and patrilineal genealogies: Evidence for a faster evolutionary rate of mtDNA lineages than Y-chromosomes” American Journal Human Genetics. Chinese Burke’s British Peerage Mormons Quebec French Heyer and Tremblay, 1998 PNAS Total Pedigree Helga son

Genealogical Questions Pedigrees Time back to first individual common ancestor to everyone ARG questions: The height of ARGs - correlation between local phylogenies Gene Phylogeny Questions Total Branch Length - Height

Long Term Evolutionary History: Myr/Gyr Origin of Life Last Universal Common Ancestor – LUCA First Eukaryotes First Chordates First Vertebrates First Mammals First Primates First Hominoids Chimp-Human Split Hedges, SB (2002) “The Origin and Evolution of Model Organisms” Nature Review Genetics Brown (2003) “Horizontal Genetic Transfers “ Nature Genetics

observable Parameters:time rates, selection Unobservable Evolutionary Path observable MRCA-Most Recent Common Ancestor ? 3 Problems: i. Test all possible relationships. ii. Examine unknown internal states. iii. Explore unknown paths between states at nodes. ATTGCGTATATAT….CAG Time Direction The Comparative Aspect.

Observable Unobservable U C G A C A U A C Goldman, Thorne & Jones, 96 RNA Structure Gene Structure One Principle of Comparative Genomics Protein Structure

Molecular Evolution and Gene Finding: Two HMMs Simple ProkaryoticSimple Eukaryotic AGTGGTACCATTTAATGCG..... P coding {ATG-->GTG} or AGTGGTACTATTTAGTGCG..... P non-coding {ATG-->GTG}

The Rise of Comparative Genomics Lander et al(2001) Figure 25A

RNA (Secondary) Structure Sequences ACTGT ACTCCT Protein Structure Cabbage Turnip Gene Order/Orientation. Gene Structure Interaction Networks Any Graph. General Theme. Formal Model of Structure Stochastic Model of Structure Evolution. Renin HIV proteinase The Domain of Comparative Genomics

Linkage Mapping r M D From McVean

A set of characters. Binary decision (0,1). Quantitative Character. Dominant/Recessive. Penetrance Spurious Occurrence Heterogeneity genotype Genotype  Phenotype phenotype 2N e generations Association/Fine scale mapping

Single marker association Bayesian analysis 1000 cases and 1000 controls typed at 8 microsatellite markers BRCA2 example Rafnar et al.(2004) – Morris et al(2001) + Causative SNPs.

Short Term Evolutionary History: Kyr/Myr Oldest Polymorphisms Neutral Human Autosomal Polymorphisms First Out-of-Africa Anatomically Modern Man Peopling of the Globe – genetic and fossil evidence. The globe & migrations: Cavalli-Sforza, HEG (2004) Supposedly well behaved populations Iceland Finland Sardinia

HapMap “The International HapMap Project “Nature 426, (18 Dec 2003) Started October 27-29, 2002

HapMap

Ontologies Gene Ontology Consortium (2001) “Creating the Gene Ontology Resource: Design and Implementation.” Genome Research Gene Ontology Consortium (2004) “The Gene Ontology (GO) database and informatics resource” Nucleic Acid Research 32.D A Structured Vocabulary – Consistent across species. Purpose: Facility communication among researchers Facility communication among computer systems 2001: Three Ontologies: Molecular Function Biological Process Cellular Component Source NAR(2004) 32.D258-

Structural Genomics: Systematic Structure Determination John Westbrook, Zukang Feng, Li Chen, Huanwang Yang and Helen M. Berman “The Protein Data Bank and structural genomics” Nucleic Acids Research, 2003, Vol. 31, No PDB Holdings List: 10-Feb-2004 Molecule Type Proteins, Peptides, and Viruses Protein/Nucleic Acid Complexes Nucleic AcidsCarbohydratestotal Exp. Tech. X-ray Diffraction and other NMR Total Examples: Center for Eukaryotic Structural Genomics Structural Genomics of Pathogenic Protozoa Consortium Berkeley Structural Genomics Center : Mycoplasma genitalium and Mycoplasma pneumoniae

Structural Genomics: Mycoplasma pneumoniae proteins

Proteomics Hanash,S.(2003) “Disease Proteomics” Nature Aebersold,R. and M.Mann (2003) “Mass spectrometry-based proteomics” Nature Gavin et al. (2002) “Functional Organisation of the Yeast Proteome by systematic analysis of protein complexes” Nature D PAGE gels (polyacryl gel electrophoresis ) MALDI Protein Micro-arrays Source: Hanash (2003) Source Gavin et al.(2002)

The Genome Genomes: Variation and long term evolution. Genealogical Structures: Phylogenies, Pedigrees and the ARG Long term Dynamics of the Human Genome: The comparative aspect (Genotype  Phenotype) & (Population Genetics/History) => Gene Mapping Summary

Our Genomically Motivated Projects 1.Comparative gene annotation (Meyer, Skou Pedersen) 2.Superimposed selective constraints (Forsberg, Meyer, Skou Pedersen) * 3.Haplotype Blocks (Song) * 4.Genome transformations (Miklos) 5.Ancestral Blocks* 6.Statistical Sequence Comparison (Drummond, Lunter, Miklos) 7.Substitutions and insertion-deletions at the Genome Level (Lunter) Next week

a: (3,4) b: (3,4) c: (15,16) d: (16,17) e: (35,36) f: (35,36) g: (36,37) Minimal ARGs and Haplotype Blocks (Song)

Combining Levels of Selection. Forsberg, Meyer, Pedersen Protein-Protein Hein & Støvlbæk, 1995 Codon Nucleotide Independence Heuristic Jensen & Pedersen, 2001 Contagious Dependence Assume multiplicativity: f A,B = f A *f B Protein-RNA DoubletsSinglet Contagious Dependence

A randomly picked ancestor: (ancestral material comes in batteries!) Mb Mb * kb *250 Parameters used 4N e Chromos. 1: 263 Mb. 263 cM Chromosome 1: Segments Ancestors All chromosomes Ancestors Physical Population Mill. Applications to Human Genome (Wiuf and Hein,97)

References: Books & www-pages. Books: Strachan and Read (2004) “Human Molecular Genetics” (3 rd Ed.) Bioscience Jobling, Hurles and Tyler-Smith (2004) “Human Evolutionary Genetics” Bioscience Sulston, J.(2002) “Our Common Thread” Corgi Books Ridley, Matt (2001) “Genome” “Encyclopedia of the Human Genome” (2003) Nature Publishing Group Cavalli-Sforza,L. (2001) “Genes, People and Language” Penguin Key articles: Lander et al.(2001) “Initial Sequencing and Analysis of the Human Genome” Nature Venter et al.(2001)”The Sequence of the Human Genome” Science

References: www-pages. Major sequencing centers: Baylor College of Medicine Genome Sequencing Center hgsc.bcm.tcm.edu/hgsc.bcm.tcm.edu/ Celera DoE Joint Genome Institute Genoscope TIGR Washington University Genome Sequencing Center Wellcome Trust Sanger Institute Whitehead Institute/MIT Center for Genome Research Ensembl genome annotator - European Bionformatics Institute - NCBI - Nature Genome Gateway Integrated Genomics Ebi genome databases Primate Sequencing Projects European Bioinformatics Institute Proteomics National Center for Biotechnology Information HapMap Project Homepage Online Inheritance in Man