Presentation is loading. Please wait.

Presentation is loading. Please wait.

Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked.

Similar presentations


Presentation on theme: "Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked."— Presentation transcript:

1 Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked blue. Figure 1: over-representation of neighbors in the even-skipped region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked blue. Analysis of gene regulatory regions by means of DNA composition Nora Pierstorff 1, Bernhard Haubold 2, Thomas Wiehe 1 1 Dept. of Genetics, University of Cologne, Germany, Email: nora.pierstorff@uni-koeln.de 2 Dept. of Biotechnology and Bioinformatics, Univ. of Applied Sciences, Weihenstephan, Germany Abstract: We developed a software tool, termed “shustring” and based on suffix trees, for intra-genomic and intra-specific analysis of DNA sequences. This program determines the lengths of shortest unique substrings and the number of their close variants in a genome. Comparison of expected and observed length- and neighbor-distributions yields characteristic properties in intergenic and promoter regions. We investigate the statistical properties of shustrings in intra-genomic (intrinsic) as well as in inter-genomic analyses and present results of the method for several well studied examples of regulatory regions of developmental genes in Drosophila. Introduction: There are three basic approaches to the prediction of regulatory elements. Some ab initio or intrinsic methods are based on the assumption that regulatory regions contain over-represented strings. A second approach, often with the help of position weight matrices, looks for consensus sequences of binding sites of known transcription factors. Finally, a third approach relies on the hypothesis that functional elements are more conserved than nonfunctional elements (phylogenetic footprinting). However, it is known [1] that binding sites are often not conserved even among closely related species, but may be subject to rapid evolutionary turn-over. With the “shustring” approach, we are able to analyze an arbitrary number of sequences in a single run. The result are pointers to those stretches in the query sequence which are unusual with respect to the length of unique substrings and to the size of their neighborhood. Ab initio method: The ab initio approach for the prediction of regulatory elements is to recognise regions which contain highly overrepresented patterns. Shustring returns the length of the shortest unique substring for each position and the number and position of its neighbours. Neighbours are Hamming-1-neighbours, differing exactly at the last position. To avoid dependency on the length of the query sequence, we performed a sliding window analysis (window size 1000bp). Based on an analytically derived probability distribution we calculate the p-value of the number of observed neighbours. Hereby, the length of the shustring and the GC-content of the sequence are taken into account. Shustrings with a p-value <0.05 are recorded. The relative frequency of the recorded shustrings in a window of 200bp (step size 1bp) is calculated and plotted in Figures 1 and 2. Sequence comparison: Dermitzakis and Clark [1] noted that regulatory elements may evolve very rapidly. Hence, sequence comparison alone is often not sufficient to detect regulatory elements. The shustring method allows one to determine exceptionally long (indicative of sequence conservation) and exceptionally short unique substrings at the same time. As an example, we analyzed orthologous regions in Drosophila melanogaster and Drosophila virilis, which diverged about 40Myr ago and show an average sequence identity of about 66.8%. In contrast to the ab initio analysis above, we record here shustrings with extreme lengths (p-value <0.05) and plot their relative frequency in a sliding window of length 200bp in Figures 3 and 4. Discussion: The shustring method applied to one sequence finds shortest unique sequences and their Hamming-1-neighbours, which differ at the last position. Regions, which contain many shustrings with an over-represented neighbourhood are candidates for regulatory regions. The results of our program are comparable to other methods, which predict regulatory regions based on over- represented strings [2]. To improve the prediction, we added information obtained from sequence comparison with orthologous sequences form other species. Fast evolving as well as conserved regions may be detected at the same time based on extreme shustring lengths. The examples from Drosophila melanogaster and virilis indicate that results improve with respect to the ab initio approach. Our approach is also clearly different from traditional alignment methods and may complement these as shown in the lower panels of Figures 3 and 4. References: [1] Dermitzakis E., Clark A. (2002). Evolution of transcription Factor Binding Sites in Mammalian Gene Regulatory Regions: Conservation and Turnover. Mol. Biol. Evol. 19(7):1114-11211. [2] Nazina A., Papatsenko D. (2003). Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency. BMC Bioinformatics 4:1471-2105/4/65 [3] N. Rajewsky, M. Vergassola, U. Gaul, and E. D. Siggia (2002): Computational detection of genomic cis-regulatory modules, applied to body patterning in the early Drosophila embryo. BMC Bioinformatics, 3:30 Figure 3: Comparison of even-skipped regions of Drosophila melanogaster and virilis. Upper panel: shustrings of extreme lengths. Middle panel: Average conservation. Lower panel: Ahab prediction.[3] The colour scheme is as in Fig. 1. Figure 4: As in Figure 3, but for fushi- tarazu region. Ahab prediction binding site prediction based on PWM’s Ahab prediction binding site prediction based on PWM’s Alignment score based on blastz alignment


Download ppt "Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked."

Similar presentations


Ads by Google