Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nora Pierstorff Dept. of Genetics University of Cologne

Similar presentations


Presentation on theme: "Nora Pierstorff Dept. of Genetics University of Cologne"— Presentation transcript:

1 Nora Pierstorff Dept. of Genetics University of Cologne 30.8.2005
Combined ab initio and comparative analysis of putative regulatory regions Nora Pierstorff Dept. of Genetics University of Cologne

2 Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

3 Eukaryotic regulation model

4 3 Approaches Search for binding sites of known transcription factors using Position Weight Matrices. Search for conserved motifs in upstream-regions of homolog or coregulated genes. Search statistical overrepresented motifs

5 Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

6 Ab Initio Approach (overrepresented patterns)
overrepresented patterns are frequent in the DNA => many false positive predictions amount of available data is not large enough to find additional reliable universally valid rules

7 Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

8 Dataset (collected by Nazina et al. 2003)
target-species: Drosophila melanogaster reference species: D. yakuba D. ananassae D. pseudoobscura D. virilis # sequences: 39 # bp: # regulatory regions: 87 # bp in enh: enhancer/sequence: 2.462 amount of bp in enhancers: Dorsal motif     dorsal matches

9 Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

10 Are enhancers alignable?
Emberly et al. (2003) the overlap of binding sites and conserved sequence blocks is not much greater than by chance, but still statistically significant compared organisms: D. melanogaster and D. pseudoobscura alignment methods: LAGAN, SMASH (construct chains of local alignments)

11 Assumptions about enhancer conservation
binding sites contain core sequences essential to bind transcription factor core sequences are conserved between binding sites of one species and between species binding sites are indicated by short, exactly conserved, overrepresented patterns

12 Alignment of short exact matches
input: chain of high scoring fragments from blastn alignment of each sequence pair output: regions containing a high amount of short conserved stretches

13 Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

14 Result using only comparative approach with 5 species
m8 region score = number of short conserved stretches in a 200bp window

15

16 Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

17 searching overrepresented motifs in conserved region
input: all short conserved words 1. step: counting the occurrence of all 5bp-substrings of the word in the 1000 surrounding basepairs 2. calculating one observed/expected ratio for every species output: conserved stretches containing at least one 5mer which is overrepresented in each species

18 Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

19 Improvement by combination
m8 region score = number of short conserved stretches in a 200bp window m8 region score = number of short conservedoverrepresented stretches in a 200bp window

20 improvement by combination

21 Outline Introduction Ab Initio Approach Datasets
Comparative Analysis of Enhancers and Results Combination of Both Approaches and Results Discussion

22 Discussion use of a combination of methods improves predictions
in nearest future regulatory regions can be found without knowing the binding transcription factors, if enough related species are known. more features to differ between conserved regulatory regions and other functional conserved regions need to be found

23 References E. Emberly, N. Rajewsky, E. Siggia (2003) Conservation of regulatory elements between two species of Drosophila BMC Bioinformatics 2003, 4:57 A. Nazina, D. Papatsenko (2003) Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency. BMC Bioinformatics Dec 22;4:65.


Download ppt "Nora Pierstorff Dept. of Genetics University of Cologne"

Similar presentations


Ads by Google