Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.

Similar presentations


Presentation on theme: "Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki."— Presentation transcript:

1 Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki Erice 30 Nov 2005

2 Pairwise alignment of strings A: S T O C K H O L M B: T U K H O L M A minimum number of ’mutation’ steps: a -> b a -> є є -> b …

3 Dynamic programming d i,j = min(if a i =b j then d i-1,j-1 else , d i-1,j + 1, d i,j-1 + 1) = distance between i-prefix of A and j-prefix of B (without substitutions) d i,j d i-1,j-1 d i,j-1 d i-1,j d m,n mxn table d A B ai ai bj bj +1

4 d i,j = min(if a i =b j then d i-1,j-1 else , d i-1,j + 1, d i,j-1 + 1) d ID (A,B) optimal alignment by trace-back

5 Homology searches find homologous sequences: new sequence versus all old ones in database – the most popular computational task in present-day molecular biology = approximate string matching BLAST - big success good homology => same biological function D A T A B A S E NEW SEQUENCE ?

6 Multiple alignment multiple alignment of sequence families to find interesting conserved motifs: NP-hard => heuristics, Hidden Markov models, MCMC comparison of entire genomes

7 Gene enhancer module prediction

8 Problem Gene expression regulation in multicellular organisms is controlled in combinatorial fashion by so called transcription factors (TFs). Transcription factors bind to DNA cis-elements (TF binding sites) on enhancer modules (promoters), and multiple factors need to bind to activate the module. In mammals, the modules are few and far The problem: Locate functional regulatory modules, that is, find interesting patterns.

9 Gene enhancer modules gene 1 gene 2 gene 3 gene 4 DNA RNA transcription translation Proteins transcription factors enhancer module

10 Model of cell type specific regulation of target gene expression GLIXY (tissue specific TFs) GLI Ubiquitously expressed TF transcription Common targets (e.g. Patched): Cell type specific targets (e.g. N-myc):

11 Binding affinity matrices The TF binding sites are represented by affinity matrices. –A column per position –A row per nucleotide Discovered: –Computationally –Traditional wet lab –Microarrays 9 11 49 51 0 1 1 4 19 3 0 0 0 45 25 16 5 1 2 0 17 0 4 21 18 36 0 0 34 5 21 10

12 Binding affinity matrices 9 11 49 51 0 1 1 4 19 3 0 0 0 45 25 16 5 1 2 0 17 0 4 21 18 36 0 0 34 5 21 10

13 Determined TF binding profiles (+ JASPAR)

14 Finding conserved motifs of binding sites looking at one (human) genome gives too many positives comparative genomics approach: –take the 200 kB regions surrounding the same genes (paralogs and orthologs) of different mammals: human, mouse, chicken, … –find conserved clusters (= motifs) of binding sites cluster = group of binding sites with good local alignment = > Smith-Waterman type algorithm with a novel scoring function

15 Smith-Waterman find the best local alignment of strings A and B: substring X of A and substring Y of B such that X and Y have the best scoring pairwise alignment X Y

16 Computational identification of enhancer elements Preserved in evolution: –Affinities of functional cis- elements. –Spatial arrangement of elements within a module. Human Mouse

17 Parameter optimization scoring function has 3 free parameters. Find good parameters by greedy hill climbing using a training data

18 Whole genome comparisons Whole genomes can be analyzed with our implementation EEL (Enhancer Element Locator) We compared human genes to orthologs in mouse, rat, chicken, fugu, tetraodon and zebrafish –100 kbp flanking regions on both sides of the gene. –Coding regions masked out. –About 20 000 comparisons for each pair of species.

19 Annotating the Human genome with mammalian enhancer-elements

20 EEL output ● Output from EEL program. ● Previously known functional sites are highlighted ● DNA between the sites is aligned just for the output

21 Enhancer prediction for N-myc 200 kb Mouse N-Myc genomic region 200 kb Human N-Myc genomic region Conserved GLI binding sites in two predicted enhancer elements, CM5 and CM7 coding region of N-Myc

22 Wet-lab verification ● Selected some predicted enhancer modules for wet-lab verification ● Fused 1kb DNA segment containing the predicted enhancer to a marker gene (LacZ) with a minimal promoter, and generated transgenic embryos.

23 Enhancer prediction for N-myc 200 kb Mouse N-Myc genomic region 200 kb Human N-Myc genomic region Conserved GLI binding sites in two predicted enhancer elements, CM5 and CM7 coding region of N-Myc

24 Summary input: +- 100 kb flanking sequences of DNA of orthologous pairs of genes from human and mouse find all good enough TF binding sites from the sequences find the best local alignments of the binding sites using the EEL scoring function output: the sequences in good local alignments; these are the putative enhancers postprocessing: an expert biologist selects the most promising predictions for wet lab verification; hopefully he/she has good luck!

25 Acknowledgements Kimmo Palin Outi Hallikas (Biom) Jussi Taipale (Biom) The BioSapiens project is funded by the European Commission within its FP6 Programme, under the thematic area "Life sciences, genomics and biotechnology for health,"contract number LHSG-CT-2003-503265.


Download ppt "Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki."

Similar presentations


Ads by Google