Presentation is loading. Please wait.

Presentation is loading. Please wait.

3. Lecture WS 2004/05Bioinformatics III1 Transcription – Gene regulation The machine that transcribes a gene is composed of perhaps 50 proteins, including.

Similar presentations


Presentation on theme: "3. Lecture WS 2004/05Bioinformatics III1 Transcription – Gene regulation The machine that transcribes a gene is composed of perhaps 50 proteins, including."— Presentation transcript:

1 3. Lecture WS 2004/05Bioinformatics III1 Transcription – Gene regulation The machine that transcribes a gene is composed of perhaps 50 proteins, including RNA polymerase, the enzyme that converts DNA code into RNA code. A crew of transcription factors grabs hold of the DNA just above the gene at a site called the core promoter, while associated activators bind to enhancer regions farther upstream of the gene to rev up transcription. ahttp://www.berkeley.edu/news/features/1999/12/09_nogales.html Working as a tightly knit machine, these proteins transcribe a single gene into messenger RNA. The messenger RNA winds its way out of the nucleus to the factories that produce proteins, where it serves as a blueprint for production of a specific protein.

2 3. Lecture WS 2004/05Bioinformatics III2 Transcription in E.coli and in Eucaryotes ProcaryotesEucaryotes Genes are grouped into operonsGenes are not grouped in operons mRNA may contain transcript ofeach mRNA contains only several genes (poly-cistronic)transcript of a single gene (mono-cistronic) Transcription and translation are coupled.Transcription and translation are Transcript is translated already duringNOT coupled. transcription.Transcription takes place in nucleus, translation in cytosol. Gene regulation takes place byGene regulation via transcription modification of transcription raterate AND by RNA-processing, RNA stability etc.

3 3. Lecture WS 2004/05Bioinformatics III3 Promoter prediction in E.coli To analyze E.coli promoters, one may align a set of promoter sequences by the position that marks the known transcription start site (TSS) and search for conserved regions in the sequences.  E.coli promoters are found to contain 3 conserved sequence features - a region approximately 6 bp long with consensus TATAAT at position -10 - a region approximately 6 bp long with consensus TTGACA at position -35 - a distance between these 2 regions of ca. 17 bp that is relatively constant a

4 3. Lecture WS 2004/05Bioinformatics III4 Gene regulatory promoter network In E.coli, 240 transcription factors have been verified that regulate 3000 genes. Binding site matrics are available for more than 55 E.coli TFs (Robison et al. 1998) In S. cerevisae, genome-wide binding analysis of 106 transcription factors indicates that more than one-third of the promoter regions that were bound by regulators were bound by 2 or more regulators.  Highly connected network of transcriptional regulators.

5 3. Lecture WS 2004/05Bioinformatics III5 Feasibility of computational motif search? Computational identification of transcription factor binding sites is difficult because they consist of short, degenerate sequences that occur frequently by chance.  The problem is not easy to define (therefore: it is „complex“) because - the motif is of unknown size - the motif might not be well conserved between promoters - the sequences used to search for the motif do not necessarily represent the complete promoter - genes with promoters to be analyzed are in many cases grouped together by a clustering algorithm which has its own limitations.

6 3. Lecture WS 2004/05Bioinformatics III6 Strategy 1 Arrival of microarray gene-expression data. Group of genes with similar expression profile (e.g. those that are activated at the same time in the cell cycle)  one may assume that this profile ist, at least partly, caused by and reflected in a similar structure of the regions involved in transcription regulation. Search for common motifs in < 1000 base upstream regions. Sofar used: detection of single motifs (representing transcription-factor binding sites) common to the promoter sequences of putatively co-regulated genes. Better: search for simultaneous occurrence of 2 or more sites at a given distance interval! Search becomes more sensitive.

7 3. Lecture WS 2004/05Bioinformatics III7 Motif identifaction A flowchart to illustrate the two different approaches for motif identification. We analyzed 800 bp upstream from the translation start sites of the five genes from the yeast gene family PHO by the publicly available systems MEME (alignment) and RSA (exhaustive search). MEME was run on both strands, one occurrence per sequence mode, and found the known motif ranked as second best. RSA Tools was run with oligo size 6 and noncoding regions as background, as set by the demo mode of the system. The well- conserved heptamer of the motifs used by MEME to build the weight matrix is printed in bold. Ohler, Niemann Trends Gen 17, 2 (2001)

8 3. Lecture WS 2004/05Bioinformatics III8 Strategy 2: Exhaustive motiv search in upstream regions Exploit the finding that relevant motifs are often repeated many times, possibly with small variations, in the upstream region for the regulatory action to be effective.  Search upstream region for overrepresented motifs (1)Group genes based on the overrepresented motifs (2)Analyze sets of genes that share motifs for coregulation in microarray exp. (3)Consider overrepresented motifs labelling sets of co-regulated genes as candidate binding sites. Cora et al. BMC Bioinformatics 5, 57 (2004)

9 3. Lecture WS 2004/05Bioinformatics III9 Exhaustive motiv search in upstream regions Exploit Cora et al. BMC Bioinformatics 5, 57 (2004)

10 3. Lecture WS 2004/05Bioinformatics III10 Exhaustive motiv search in upstream regions Cora et al. BMC Bioinformatics 5, 57 (2004)

11 3. Lecture WS 2004/05Bioinformatics III11 Exhaustive motiv search in upstream regions Cora et al. BMC Bioinformatics 5, 57 (2004)

12 3. Lecture WS 2004/05Bioinformatics III12 Recently published tools for promoter finding Ohler, Niemann Trends Gen 17, 2 (2001)

13 3. Lecture WS 2004/05Bioinformatics III13 Position-specific weight matrix Popular approach when list of genes available that share TF binding motif; Good multiple sequence alignment available. Alignment matrix: lists # of occurrences of each letter at each position of an alignment Hertz, Stormo (1999) Bioinformatics 15, 563

14 3. Lecture WS 2004/05Bioinformatics III14 Position-specific weight matrix Examples of matrices used by YRSA http://forkhead.cgb.ki.se/YRSA/matrixlist.html

15 3. Lecture WS 2004/05Bioinformatics III15 A protein bound to a specific DNA sequence will interfere with the digestion of that region by DNase I. An end-labelled DNA probe is incubated with a protein extract or a purified DNA-binding factor. The unprotected DNA is then partially digested with DNase I such that on average every DNA molecule is cut once. Digestion products are then resolved by electrophoresis. Comparison of the DNase I digestion pattern in the presence and absence of protein will allow the identification of a footprint (protected region) * * * * Denaturing PAGE Footprint Exp. Identification of TF binding site: DNase 1 Footprinting

16 3. Lecture WS 2004/05Bioinformatics III16 Gel Shifts Electro Mobility Shift Assay (EMSA) Band Shift Incubating a purified protein, or a complex mixture of proteins e.g. nuclear or cell extract, with a 32 P end-labelled DNA fragment containing the putative protein binding site (from promoter region). Reaction products are then analysed on a non- denaturing polyacrylamide gel. The specificity of the DNA-binding protein for the putative binding site is established by competition experiments using DNA fragments or oligonucleotides containing a binding site for the protein of interest, or other unrelated DNA sequences. ** Non-denaturing PAGE Retarded mobility due to protein binding Free DNA probe No proteinadd protein Gel retardation assays

17 3. Lecture WS 2004/05Bioinformatics III17 http://www.rcsb.org 3D structures of transcription factors 1A02.pdb1AM9.pdb 1AU7.pdb 1CIT.pdb1GD2.pdb 1H88.pdb TFs bind with very different binding modes. Some are sensitive for DNA conformation. 2 TFs bound!

18 3. Lecture WS 2004/05Bioinformatics III18 E. Giudice, R. Lavery (2002) Acc. Chem. Res. 35, 350-357. DNA conformation Conformational fluctuations of a B- DNA oligomer with an alternating GA sequence. The snapshots (100 ps intervals) from a simulation at 300 K using explicit solvent and counterions show axis and backbone fluctuations Canonical and mechanically distorted forms of helical DNA (from left to right: A-DNA, B-DNA, overstretched S-DNA,32 overtwisted P-DNA33).

19 3. Lecture WS 2004/05Bioinformatics III19 E. Giudice, R. Lavery (2002) Acc. Chem. Res. 35, 350-357. DNA conformation Induced base opening within B-DNA. Images show the conformational changes associated with moving thymine (bold) into the major groove of an oligomer with an alternating GA sequence.

20 3. Lecture WS 2004/05Bioinformatics III20 EM low-resolution structure of TF machinery Single particle images3D reconstruction of TFIID Nogales et al. Science (1999)

21 3. Lecture WS 2004/05Bioinformatics III21 Identification of individual components Position of IIB and IIA on the TFIID structure and mapping of the TBP. The blue mesh corresponds to the holo-TFIID, with the A, B, and C lobes indicated. (A) The green mesh corresponds to the density difference between the holo-TFIID and the TFIID-IIB complex. (B) The magenta and green meshes show the density difference between the holo-TFIID and the trimeric complex TFIID-IIA-IIB. The density depicted in light green can be attributed to TFIIB by comparison with (A), and the magenta density therefore corresponds to IIA. (C) The yellow mesh shows the density difference between the holo-TFIID and TFIID that is bound to the TBP antibody. Nogales et al. Science (1999)

22 3. Lecture WS 2004/05Bioinformatics III22 database for eukaryotic transcription factors: TRANSFAC BIOBase / TU Braunschweig / GBF Relational database 6 flat files: FACTOR interaction of TFs SITE their DNA binding site GENE through which they regulate these target genes CELL factor source MATRIX TF nucleotide weight matrices CLASS classification scheme of TFs Wingender et al. (1998) J Mol Biol 284,241

23 3. Lecture WS 2004/05Bioinformatics III23 database for eukaryotic transcription factors: TRANSFAC BIOBase / TU Braunschweig / GBF Matys et al. (2003) Nucl Acid Res 31,374

24 3. Lecture WS 2004/05Bioinformatics III24 Match TM Search for putative TF binding sites in DNA sequences based on weight matrices. Use 2 values to score putative hits: Matrix similarity score: quality of a match between the sequence and the whole matrix  [0,1] Core similarity score: quality of a match between the sequence and the core sequence of a matrix which consists of the five most conserved consecutive positions in a matrix  [0,1] Profile: set of matrices and their cut-offs designed for function-driven searches Special profiles available for immune-cells, muscle cells, liver cells, and for cell- cycle. Matys et al. (2003) Nucl Acid Res 31,374

25 3. Lecture WS 2004/05Bioinformatics III25 BIOBase / TU Braunschweig / GBF Matys et al. (2003) Nucl Acid Res 31,374 database for eukaryotic transcription factors: TRANSFAC

26 3. Lecture WS 2004/05Bioinformatics III26 TRANSFAC classification 1 Superclass basic domains3 Superclass: Helix-turn-helix 1.1 Leuzine zipper factors (bZIP) 1.2 Helix-loop-helix factors (bHLH)4 Superclass: beta-Scaffold 1.3 bHLH-bZIP Factors with Minor Groove 1.4 NF-1 Contacts 1.5 RF-X 1.6 bHSH5 Superclass: others 2 Superclass: Zinc-coordinating DNA-binding domains 2.1 Cys4 zinc finger of nuclear receptor type 2.2 diverse Cys4 zinc fingers 2.3 Cys2His2 zinc finger domains 2.4 Cys6 cysteine-zinc cluster 2.5 Zinc fingers of alternating composition http://www.gene-regulation.com/pub/databases/transfac/cl.html

27 3. Lecture WS 2004/05Bioinformatics III27 TRANSFAC classification Eintrag für 1.1 Leuzine-Zippers http://www.gene-regulation.com

28 3. Lecture WS 2004/05Bioinformatics III28 TRANSFAC classification http://www.gene-regulation.com

29 3. Lecture WS 2004/05Bioinformatics III29 TRANSFAC classification http://www.gene-regulation.com

30 3. Lecture WS 2004/05Bioinformatics III30 Summary http://www.gene-regulation.com Large databases available (e.g. TRANSFAC) with information about promoter sites. Information verified experimentally. Microarray data allows searching for common motifs of coregulated genes. Also possible: common GO annotation etc. TF binding motifs are frequently overrepresented in 1000 bp upstream region. Clear function of this is unknown. (Same as in proline-rich recognition sequences.) Relatively few TFs regulate large number of genes.  Complex regulatory network, Thursday lecture.


Download ppt "3. Lecture WS 2004/05Bioinformatics III1 Transcription – Gene regulation The machine that transcribes a gene is composed of perhaps 50 proteins, including."

Similar presentations


Ads by Google