Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park.

Similar presentations


Presentation on theme: "Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park."— Presentation transcript:

1 Comparative analysis of eukaryotic genes Mar Albà http://genomics.imim.es/evolgenome Barcelona Biomedical Research Park

2 Genome Projects GOLD: Genomes Online Database (www.genomesonline.org)

3 Genome Projects GOLD: Genomes Online Database (www.genomesonline.org)

4 Genome Projects GOLD: Genomes Online Database (www.genomesonline.org)

5 Genome Browsers -NCBI Map Viewer http://www.ncbi.nlm.nih.gov/mapview/ -Ensembl http://www.ensembl.org -UCSC Genome Browser http://genome.cse.ucsc.edu The three databases use the same genome assembly, which is generated by NCBI.

6 Ensembl

7 -genomic regions -alignments with synthenic sequences -genes - Homologs, SNPs - transcripts - EMBL mRNAS, ESTs, Expression -proteins -Gene Ontology (function), protein domains, disease associations

8 Ensembl - Biomart - retrieval of information on gene datasets

9 Gene comparative sequence analysis Genome and transcriptome projects have generated a vast amount of information on protein-coding and non-coding gene sequences. Identification of conserved sequences in different genes can help us understand gene evolution and identify functional regions. species 1 species 2 x N genes (orthologs)... promoter coding species m

10 Non-coding sequences in vertebrate genomes -only 1.2% of the human genome codes for proteins -but 5% exhibits high sequence conservation levels, compatible with negative selection (MGSC, 2002) -non-coding - Transcription regulatory regions - Introns - Non-protein coding exons/genes (miRNAs, etc.) - Repetitive elements (Alus, etc.) - Ultra-conserved elements

11 Gene transcription regulatory sequences Maston et al., 2006 Annu. Rev. Genomics Hum. Genet. 7: 29-59

12 Frequently-found metazoan motifs in the core promoter Maston et al., 2006

13 Wray et al. (2003), Mol. Biol. Evol. 20(9):1377-1419. Eukaryotic promoter diversity

14 High evolvability of regulatory sequences -most of the changes in regulatory networks are likely to occur in cis; changes in trans (transcription factors) may often have too strong effects. -one single mutation may lead to the acquisition of a new DNA-factor interaction (rapid turnover) -the expression in one tissue may evolve independently of expression in another tissue (promoter modular organization) Wray et al. (2003) The Evolution of Transcriptional Regulation in Eukaryotes. Mol. Biol. Evol. 20(9):1377-1419.

15 Transcription factor binding sites (TFBS) are short and imprecise -short sequence motifs (6-12 bp) - some positions of the motif are variable - sometimes different transcription factors can recognize the same sequence motif TATAAA TATAGA TATAAA GATAAA TATAAA TATAAT *** TATA box

16 Transcription factor binding sites (TFBS) Weight matrices TATAAA TATAGA TATAAA GATAAA TATAAA TATAAT *** 1 2 3 4 5 6 - - - - - - - - - - - - A0 8 0 87 7 C0 0 0 00 0 G1 0 0 01 0 T7 0 8 00 1 -> can be used to search for putative motifs in sequences

17 TRANSFAC http://transfac.gbf.de/TRANSFAC/ http://www.biobase.de TRRD http://www.bionet.nsc.ru/trrd/ Place http://www.dna.affrc.go.jp/htdocs/PLACE/ ooTFD / rTFD http://www.ifti.org/cgi-bin/ifti/ootfd.pl SCPD http://cgsigma.cshl.org/jian/ RegulonDB http://regulondb.ccg.unam.mx/ Transcription factor binding site databases

18 TFBS prediction using weight matrices PROMO Farré, D., et al. (2003). Nucleic Acids Research 31: 1739-1748. http://promo.lsi.upc.edu

19 High false positive rate in TFBS prediction Test Sequences: 200 vertebrate promoter sequences 607 experimentally-verified sites Blanco, E., et al.. (2006). Nucleic Acids Research 34: D63-D67. Predictions: Transfac v.6.4 SENSITIVITY: 46% SPECIFICITY: 2% Very low!

20 Comparative approaches are necessary - orthologous sequences : phylogenetic footprinting - co-expressed genes : shared regulatory motifs Select those motifs or regions that are shared by:

21 Boffelli D, Nobrega MA, Rubin EM. (2004) Nat Rev Genet. 5:456-65 Phylogenetic footprinting

22 Highly conserved enhancer in gene DACH1 Phylogenetic footprinting

23 Proximal promoter pre-initiation complex

24 Motif positional bias Signal Search Analysis Server (SIB)

25 Why some motifs should show positional bias? - promoter structure - protein-protein interaction positional constraints Predicted element Reference element (known) TFBS 1 proximal promoter TSS PIC ACT TFBS 1 TFB 2 regulatory module TF1TF2

26 PEAKS: identification of motif positional bias functionally-related sequences (ex. co-expressed) random Predicted element Reference element (known) TSSTFBS over-representation

27 1 1 2 3 seq1 seq2 seq3 seq4 PEAKS 1 1 2 Step 1. Construct motif frequency profile profile sliding window Predicted element Reference element (known)

28 PEAKS Step 1. Construct motif frequency profile 308 housekeeping genes Transfac v.6.4 matrix library TSS

29 PEAKS Step 2. Measure significance of peaks Score (max peak) = Sa x Sb x Sc Sa = max peak / num motif Sb = max peak / num seq Sc = max peak / average num motifs maximum peak For each matrix: CAAT-box +675-325 average signal difference

30 PEAKS Step 2. Measure significance of peaks - determine random expectation score cut-off for different levels of significance using 1000 random datasets - define significant signal range: cut-off 0.005 max peak CAAT-box aver signal

31 PEAKS Step 3. Build “promoter type” 52 genes regulated by NFkB, p < 0.5% TATA Sp1 NFkB BACH1

32 PEAKS server http://genomics.imim.es/peaks/ Bellora, Farré and Albà (2007). Bioinformatics 23, 243-4.

33 308 housekeeping genes52 NFkB regulated genes TATA CAAT GC-box YY TATA NFkB GC-box BACH1 PEAKS results human promoter sequences TRANSFAC vertebrate matrices

34 PEAKS results promoters from yeast genes, amino acid metabolism (86 genes) - 54 yeast weight matrices tested - significant regions detected by the method show significant enrichment in experimentally-validated sites

35 Measuring promoter sequence divergence promoter species 1 species 2 promoter species 1 species 2 Divergence (Non-aligned promoter fraction or dSM) 0.8 0.4 Castillo-Davis et al., 2004 1. highly divergent -> less constraints 2. highly conserved -> more constraints

36 0-0.10.1- 0.2 0.2- 0.3 0.3- 0.4 0.4- 0.5 0.5- 0.6 0.6- 0.7 0.7- 0.8 0.8- 0.9 0.9-11 Variability in promoter sequence divergence 8385 human-mouse orthologues 2 Kb from transcription start site Average divergence = 70%

37 Regulatory genes contain more conserved promoters than structural/metabolic genes Functional classes enriched in high score promoter alignments Lee et al. (2006). BMC Genomics 6: 188 - consistent with results by Iwama and Gojobori (2004)

38 Structural/metabolic genes contain less highly conserved promoters than regulatory genes Functional classes enriched in low score promoter alignments Lee et al. (2006). BMC Genomics 6: 188

39 Comparison neurogenesis versus ribosomal neurogenesis ribosomal Lee et al. (2006). BMC Genomics 6: 188

40 Is expression breadth related to promoter sequence divergence? Expression data from Zhang et al. (2004) tissue-specific intermediate housekeeping orthologues human-mouse

41 promoter species 1 species 2 Measure sequence divergence -tissue-specific -intermediate -housekeeping Divergence = non-aligned promoter fraction 2 Kb

42 Relationship between promoter divergence and expression breadth number of tissues Coding sequence evolutionary rate Promoter divergence but.. housekeeping tissue-specific intermediate promoter divergence coding sequence divergence

43 Relationship between promoter divergence and expression breadth - divergence measured in 100 nt bins housekeeping non-housekeeping TSS % conservation

44 Promoter divergence and gene function highly divergent promoter RNA binding ligase activity hydrolase activity catalytic activity highly conserved promoter receptor binding signal transducer activity receptor activity structural molecule activity transcription regulator activity transcription factor activity DNA binding GO class > 50 genes, p-value < 0.01

45 Promoter divergence and gene function divergence

46 Summary - the prediction of transcription factor binding sites is very noisy, we need to use comparative genomics - some motifs show positional bias, this property can help us understand the structure of promoters and improve motif predictions -promoter sequence conservation is related to gene function and to gene expression breadth. the fact that housekeeping genes contain less conserved promoters may obey to a more simple gene expression regulation

47 Nicolas Bellora Domènec Farré Loris Mularoni Macarena Toll The team Evolutionary Genomics Group Universitat Pompeu Fabra, Barcelona http://genomics.imim.es/evolgenome Medya Shikhagaie


Download ppt "Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park."

Similar presentations


Ads by Google