Anopheles gambiae: A genetic approach Karin Eiglmeier Unité de Biochimie et Biologie Moléculaire des Insectes Institut Pasteur
Mosquitoes Order : Diptera ToxorhynchitinaeAnophelinaeCulicinae Family: Culicidae AedesCulexAnopheles
Since when ???? Anopheline fossile (20 Myr) ber/critters/skeeter-b.html (Amber from the Dominican Republic) Dr. David Grimaldi Cretaceous carnivorous dinosaures (145-65Myr) Oldest Culicidae-like fossile: Myr (canadian amber) FROM: index.php?id=galeria
Generalized mosquito life cycle The larvae feed on microorganisms and organic matter in the water days The pupal stage: non-feeding stage of development, reacts to stimuli -> metamorphosis 1- 4 days Eggs hatch into larvae within 48 H. Larvae Pupa Eggs Adult modified from:
Mosquito facts CO 2, temperature, humidity, odor, colour (infra-red) mouvement Malaria West Nile virus filarial diseases dengue encephalitis yellow fever miles/hour miles meters DetectionTransmission Immunity Blood-meal endophage 4x weight
Mosquito facts CO 2, temperature, humidity, odor, colour (infra-red) mouvement Malaria West Nile virus filarial diseases dengue encephalitis yellow fever miles/hour miles meters DetectionTransmission Immunity Blood-meal endophage 4x weight
Malaria (“Paludisme”) - Vector: Anopheles gambiae - Parasite: protozoans of the genus Plasmodium > 90 countries (40% of the world’s population) - 90% malaria fatalities in sub-saharan Africa millions clinical cases/ year millions deaths
Plasmodium life cycle Adapted from Waters, Science,301 (2003) ~13-18 days
Anopheles mosquitoes Genus: Anopheles AfricaAsiaAmerica An. gambiae An. arabiensis An. funestus An. stephensi An. farauti An. sinensis An. tellessarus An. minimus An. albimanus An. quadrimaculatus An. darlingi An. freeborni Principal disease transmitting species: about 70 transmit malaria to humans about 20 are important vectors
- An. gambiae M et S - An. arabiensis - An. melas - An. merus - An. bwambae - An. quadriannulatus - An. quadriannulatus B Anopheles gambiae complex Adapted from J.Mouchet & D.Fontenille
Golden Path length ~ 273 Mb ( from: Anopheles gambiae chromosomes
Whole genome assembly Genome size:- 278 Mb ( 273 Mb ) scaffolds scaffolds => 91% of the genome contigs Inter-scaffold gaps: - sequence gaps, no clones - repeat sequences - smaller scaffolds ? Y chromosome: - no assembly - ongoing ? 0.18Mb - high repeat content Current release AgamP3 last update: 2/2006 First draft: March 2002
Mb Anopheles gambiae chromosomes ++ adapted from:
Immediate results 20 genes (1999) genes (2005) Cross disciplinary research: - Drosophila community - Bioinformaticiens - other domains - BLAST Fieldwork: - Entomologists - Identification follow-up of mutations adapt strategy
Gene prediction and annotation Several evidences: - Gene prediction programs: - open reading frame - signals: start codon, stop codon, poly-adenylation site - splice sites - bias in base composition - bias in base frequency - encoded peptide has similarity with known protein - encoded peptide has similarity with a protein domain or motif - « evolutionarily conserved sequences » - Ecores - cDNA, EST, SAGE biological evidences
Annotation genes Celera pipeline (Otto) Ensembl pipeline “Ab inito gene finding” Homology 9896 transcripts14564 transcripts transcripts genes (identified exclusively) “consensus set”
Does gene prediction correspond to real gene ? Problems: - real gene? => mono-exonic ! - small exons, intron-exon structure - first and last exon - untranscribed regions (“UTRs”) - genes for atypic or specific proteins - genes duplicated in tandem - pseudo-genes
Comparison An. gambiae - D. melanogaster Common ancestor Anopheles gambiae Drosophila melanogaster 250 ~ 273 Mb annotated genes (13765 in AgamP3) ~ 130 Mb annotated genes
Protein similarity DmAg 44,2% 11,0% 15,9% 10,3% 18,6% 47,2% 13,8% 17,9% 10,0% 11,1% species specific Homologs, best matches: - non-insects Homologs, best matches: - insects “Many-to-many” orthologs, duplications 1:1 Orthologs (6089 pairs) average identity: 56% increased speed of divergence : Orthologs Human - Fugu: average identity: 61% adapted from: Zdobnov E. Science (2002), 298,
Starting point: Publication of the Anopheles gambiae genome Genome sequence incomplete - First characterisation and annotation of genes of variable quality Problems:Post-genomic analysis difficult Gene detection Approach:Full-length enriched cDNA libraries: - developmental stages - different tissues Aims:- Identification of new genes - Improve description of gene structure (TSS, UTRs, Exon / Intron) - Alternative splicing recombinant proteins - Facilitate comparative genomics Genome, gene expression and annotation
How to get more information?
Experimental evidence: - Transcriptome - Proteome - Biochemistry - RNAi - Transgenesis
Modified from Zhang MQ Nat. Rev. Genet.2002(9): and from Ben-Dor,S ATGSTOPPoly(A) site TTSTSS Promoter ATGSTOPPoly(A) site DNA Pre-mRNA mRNA 1342 AUG 5 STOP Poly(A) 5’UTR3’UTRCDS CAP From DNA to mRNA
Modified from Zhang MQ Nat. Rev. Genet.2002(9): and from Ben-Dor,S ATGSTOPPoly(A) site TTSTSS Promoter ATGSTOPPoly(A) site DNA Pre-mRNA mRNA 1342 AUG 5 STOP Poly(A) 5’UTR3’UTRCDS CAP From DNA to mRNA
AAAA Gppp p OH mRNA + CAP mRNA sans CAP AAAA Gppp OH AAAA p OH AAAA OH BAP treatment TAP treatment RNA ligation 5’-oligo « Oligo-capping » ( Maruyama & Sugano ) adapted from:Suzuki et al. Genome Res.(2001)11(5):677-84
AAAA TTTT AAAA TTTT AAAA Ligation First strand synthesis Alkaline degradation PCR SfiI Digestion TTTT AAAA adapted from:Suzuki et al. Genome Res.(2001)11(5):677-84
Banques ADNc « Full-length » modified from:
More genes to discover ? reads ~ 3700 clusters 85 % improved cDNAs - submitted to EMBL : 654 new genes Pilot project: Adult females 3032 Ensembl genes
Perfect annotation Gene model cDNA data proteins - Ensembl -
Improvement of annotation - Ensembl -
600 new genes - how are they?
”New” genes - Ensembl -
Predictions and proof - Ensembl -
Banques ADNc « Full-length »
Clustering results: 5664 cluster new genes Adulte females (4056) Embryos (1816) Larvae (1982)
Available or planned cDNA libraries: Available/sequenced: Adult females Embryo Larves Salivary glands Planned: Pupa
Plasmodium life cycle Adapted from Waters, Science,301 (2003) Adapted from: James, A.A., (2003),206:
Sporozoites - invasion, specific receptors - secretory cells - storage - influence normal functions Saliva - proteins involved in l’hematophagy - modulation of immune defense An. gambiae salivary glands
Chromosome X, pos kb: Anopheles-specific SG1 family AgamP3 Moz2a v.34 Adapted from: Arca et al. J.ExpBiol(2005), 208:3971
What does genomics offer for malaria control ?
Adapted from Waters, Science,301 (2003) Vector eradication
Insecticides Monitoring of insecticide resistance genes - pyrethroid resistance - epidemiology Detoxifying enzymes Detox-chip New targets
Adapted from Waters, Science,301 (2003) Host - vector relationship
BehaviourCandidate genes: odorant - smell79 putative odorant receptors gustatory- taste76 putative gustatory receptors Attractant/Repellant - Host location Vectorial capacity Mating Oviposition Traps
Adapted from Waters, Science,301 (2003) Transmission blocking - SM1
SM1 = salivary gland and midgut peptide 1 (Gosh, et al. 2000) 12 amino acids transgenic mosquitoes: - midgut expression - blocking plasmodium - salivary gland expression - blocking invasion - selective advantage of transgenic mosquitoes - proof of principle => Marrelli et al., PNAS 104, 2007
Adapted from Waters, Science,301 (2003) Immune system
Natural efficient immune system - resistance against Plasmodium - melanotic incapsulation - several loci - identify genes - multiply/release naturally occuring resistant strains
Adapted from Waters, Science,301 (2003) Salivary glands
- cDNAs - RT-PCR - protein expression - SAGE analysis of transcriptome - proteomic studies of salivary glands and saliva salivary gland sporozoite relationships - saliva and humans: - immunomodulatory - function? - vaccine target?
Where to go from now ?? Annotation = continuing process - new start for Anopheles research - Improvement of genomic annotation: - Sequence, gene models, promoters - genome arrays : -infected non-infected - Annotation at the protein level: - protein interaction networks - hypothesis experiments - comparative genomics - more insects (honey bee, Aedes) - transgenic mosquitoes, RNAi experiments
Aedes aegypti - Start: September Fin : Spring 2007 ? first assembly, version Public: - First annotations - cDNA sequences - Genome size: > 1300 Mb => in 4758 supercontigs - « Evolutionary distance » Anopheles - Aedes: ~ Myr From:
Gene size comparison Anopheles Aedes
Gene size comparison 4 kb 16 kb Anopheles Aedes
Aedes aegypti annotation - genome size bigger than expected (780 Mb => 1300 Mb) - sequencing strategy different - cDNAs early in project - high content of repeat sequences (~68%) - gene prediction programs adapted - long genes, nested genes - Anopheles - Aedes - synteny between chromosome arms
Data bases Ensembl Anopheles gambiae Aedes aegypti Anobase Anopheles gambiae Vectorbase Anopheles gambiae Aedes aegypti Ixodes scrapularis TIGR - The Institut for Genome Research
Traps Identification Understanding - behaviour Genome/proteome caracterisation Promoter/expression Vector control strategies: - Transgenesis Genetically modified mosquitoes -reduced parasite transmission -functional genomics New insecticidal targets Vaccines Main orientations in mosquito research Immunity
Collaborations Institut Pasteur BBMI - Paul Brey Charles Roth Inge Holm Pierre Dehoux (PF4) Shawn Gomez Sylvie Perrot Marie-Kim Chaveroche Jean Sautereau Plate-forme Genomique - PF1 Christiane Bouchier Anthony Lepelletier Genoscope Jean Weissenbach Beatrice Segurens Patrick Wincker Gabor Gyapay Corinne da Silva Betina Porcel AMSUD Network Sergio Verjovski-Almeida Universidade de Sao Paulo Hamza el Dorry Suely L. Gomes Carlos F.M. Menck Ana L. Nascimento