Presentation is loading. Please wait.

Presentation is loading. Please wait.

Too many matches….

Similar presentations


Presentation on theme: "Too many matches…."— Presentation transcript:

1 Too many matches…

2 ? Too many matches… A typical question: A typical approach:
What are the potential TF sites involved in regulation of my gene of interest ? A typical approach: “Let´s run MatInspector over the promoter region of my gene”

3 ? Too many matches… A typical question: A typical approach:
Where do I get my input promoter DNA sequence from? A typical approach: “Let´s extract from NCBI. 3kb upstream of TSS to be sure to have the promoter…”

4 ? Too many matches… A typical result:
Which of those matches are relevant? How do I get rid of all those “false positives” ?

5 TF binding sites… Important facts to consider:
There is not a single false positive match MatInspector gives you all physical TF binding sites A physical TFBS is found every 10 to 15 bps throughout the genome A single isolated TF binding site carries no function TFs work through complexes which are represented on sequence level through sets of TF binding sites in certain distance relationship and orientation ->promoter frameworks

6 ? TF binding sites… Okay, what is now a physical TF binding site ?
What is a functional TF binding site?

7 Physical binding sites have no function in transcription on their own
False positives? A physical binding site is invariable A physical binding site is a fixed part of the genome = weight matrix / IUPAC string Physical binding sites can be detected by MatInspector This DNA sequence usually can bind to its cognate protein(s) Physical binding sites have no function in transcription on their own

8 Physical vs functional TFBS
A functional binding site depends on context! A functional binding site requires a cellular context One binding site, five cell types... ...but binding proteins are present only in 2 cell types! -> no functional binding site in the other 3 cell types! A functional binding site requires a genomic context ...biological function may require additional binding sites! Even when binding proteins are present... Module Transcriptional function is defined by the cellular and genomic context

9 The core promoter - just another module
Transcriptional modules A transcriptional module is the smallest functional unit A transcriptional module consists of two or more TFBSs Strand orientation, relative order and distance of TFBSs are important A module also has a strand orientation and can shift within a promoter Transcriptional modules are present in promoters and enhancers TATA box INR The core promoter - just another module F1 + F2 - F3 +/- Transcriptional modules integrate signals via the interacting TFs

10 Why uses nature modules?
B C A B C A B C No common organization? Common modules!

11 Transcriptional modules
Promoter modules can work in three different ways Synergistic Antagonistic Synergistic “Short range module” distance ≤ 50 bp “Composite elements” “Short range module” distance ≤ 50 bp “Looping module” distance up to 300bp or or Binding Affinity: High / Low Is possible High / Low Is possible High / Low Is possible High / High only

12 Modules are the basic elements of regulatory pathways and networks
Transcriptional modules Transcriptional modules define target genes of pathways NFkappaB is involved in regulation of target genes of several pathways NFkappaB regulates a number of “target genes” NFkB CREB NFkB C/EBP CREB C/EBP IL-6 IL-8 ICAM-1 SAA-1 SAA-2 ELAM-1 IFN-ß IP-10 G-CSF IL-2 HLA-A HLA-B IL-1 E-Selectin IRF-1 NFkappaB NFkappaB Induced by 2 pathways ! NFkB IRF-1 NFkB NFkB Modules are the basic elements of regulatory pathways and networks

13 Transcriptional modules
Key – lock principle Protein complex binding TFIIB TFIIE TFIIH IN R TFIID TFIIF RNA polymerase II TFIIA TBP proximal promoter core distal promoter/ enhancer TF binding sites „DNA-looping“ TATA TF binding sites Transcription factor binding sites

14 Transcription regulation implies a regulatory network
Transcriptional modules Transcription regulation mechanism Promoter Exon Gene A, transcript n Primary transcript Protein complex Gene C, transcript m Gene B, transcript p Transcription regulation implies a regulatory network

15 Same lock – different keys: Same gene - different biological context
Transcriptional modules Context dependent expression by different protein complexes TFIIB TFIIE TFIIH TFIID TFIIF TFIIA TBP TATA TFIIB TFIIE TFIIH IN R TFIID TFIIF TFIIA TBP TATA Same lock – different keys: Same gene - different biological context

16 Transcriptional modules
Context specific transcription regulation Example: Analysis of the RANTES promoter in different cell lines Experimentally verified evidence that TFBSs from modules, which are crucial for regulation in one biological context (cell type), are totally irrelevant in another ! Fessele, S., Maier, H., Zischek, C., Nelson, P.J., Werner, T. (2002) "Regulatory context is a crucial part of gene function" Trends in Genetics 18, (MEDLINE )

17 Module matches reduce experimental efforts by orders of magnitude
Transcriptional modules Modules contribute strongly to functional promoter analysis Modules are usually linked to at least one known biological function A module match in a promoter makes this gene a good candidate A module match in a promoter does not prove the gene to be a target Additional independent evidence is required to prove the target A module match immediately suggests experimental verification Module matches reduce experimental efforts by orders of magnitude

18 Promoter sequences ? Very interesting – but how does all this help me with my original question ? The question still is: What are the potential TF sites involved in regulation of my gene of interest ?

19 ? Promoter sequences More things to consider
before asking that question ! There was another one: ? Where do I get my input promoter DNA sequence from? “Let´s extract from NCBI. 3kb upstream of TSS to be sure to have the promoter…”

20 ? Promoter sequences More things to consider …
3 kb is too large for meaningful analysis even going 10kb upstream of TSS is no guarantee to have the relevant promoter sequence multiple promoters are the rule, not an exeption the non-coding first exon is always part of the promoter ? Huh? What does this mean ? Where do I get this damn promoter now?

21 Genes usually have alternative transcripts with alternative promoters
Alternative transcripts/promoters Which promoter? One gene = one promoter ? Gene A? Gene A? Gene A? Genes usually have alternative transcripts with alternative promoters

22 Alternative transcripts/promoters
Context dependent expression via different promoters Example: Glucokinase Coding exons Hepatic promoter Pancreatic promoter Y Tanizawa, A Matsutani, KC Chiu, and MA Permutt Human glucokinase gene: isolation, structural characterization, and identification of a microsatellite repeat polymorphism Mol. Endocrinol., Jul 1992; 6:

23 Alternative transcripts/promoters
Comparative genomic map of the Glucokinase GCK Promoter set 1 Pancreatic promoter Promoter set 2 Hepatic promoter Data from ElDorado

24 Alternative transcripts/promoters
Important facts to consider: Alternative promoter usage is often tied to regulation of tissue specific gene expression Alternative promoter usage is of very high biological relevance. There are several examples where aberrant regulation of the identical primary transcript leads to severe biological effects

25 Alternative transcripts/promoters
Aromatase: Switch in promoter usage is associated with disease 1.1 1.4 1.f 1.6 1.3 1 II III IV V VI VII VIII IX X AATAAA Normal breast Breast cancer Aromatase The gene product is absolutely identical. The only difference is in the alternative promoter usage. On transcript level this can be seen only in the non-coding first exon.

26 Promoter Analysis The aim of in silico promoter analysis - summary
1. Identification of the promoter sequence context 1 context 2 context 3 : context n 2. Prediction of physical transcription factor binding sites 3. Functional context 4. Context dependent functional transcription factor binding sites

27 … www.genomatix.de ElDorado promoter sequence retrieval
Yes! I know all of this! I just wanted to know from where I can get my promoter sequence(s) easily! If you don´t have one already, sign up for a free evaluation account. first... ... then login here!

28 ElDorado promoter sequence retrieval

29 ElDorado promoter sequence retrieval
Choose the organism. Either enter here the locus ID, or the gene name …or choose a sequence file from your directory... … or copy & paste a raw sequence here. It can be cdNA or whatever you have. It will be exactly mapped to the genomes within seconds. Upload a file from your local disk… ...accession number… … or exact contig position

30 ElDorado promoter sequence retrieval
IMPORTANT! Affymetrix probe-set-ID input : Our annotation is NOT based on the Affymetrix NetAffx assignment!It is rather based on genomic mapping of each single probe. A transcript will be retrieved if at least one probe of the set (usually 11 probes) matches. For mixed probe sets (cross-hybridisation), all relevant transcripts will be retrieved, which might lead to a result with transcripts from different loci. Input in this section delivers results based on gene name or keyword search. Over a million of names, synonyms and gene IDs help to find what you want - fast! HMGCS1 ( for example) Input in this section delivers results based on ultra fast sequence mapping. Copy and paste raw sequence data here (min.15 nucleotides) or enter an accession number. In contrast to the entry of an accession number above, here the sequence is actually retrived from data base and mapped onto the genome(s). NOTE: many EST based accession numbers have poor sequence homology and deliver no result.

31 ElDorado promoter sequence retrieval
… here you can choose which chip´s probes to see... … licensed customers can add their own sequence data

32 ElDorado promoter sequence retrieval
This gives you an interactive graphical representation of the genomic context of your gene

33 ElDorado promoter sequence retrieval
switch display of components on and off mapping positions of Affymetrix single probes ! scale/slide the retrieved genomic "window" select regions of the graphics and safe them into a file Orange indicates your input. In this case a gene name. It is very informative when your query is based on sequence data. Then you see the mapping positions. Everything is clickable – just play around ! Here you can scale the view

34 ElDorado promoter sequence retrieval
Clicking on this trancriptional start region (TSR)... ...displays this hyperlink to ... Now we have zoomed into the promoter region

35 ElDorado promoter sequence retrieval
...this profile of the different experimentally verified TSS (CAGE tags) in the different tissue types.

36 ElDorado promoter sequence retrieval
This is a table-like representation of all annotated elements. It is especially useful for quick and easy retrieval of the dna sequence(s) of interest.

37 ElDorado promoter sequence retrieval
Tick/un-tick the boxes of what you would like to see, and then...

38 ElDorado promoter sequence retrieval
This for instance... ...tells you that this SNP deletes three potential TF binding sites and creates a new one. A potential regulatory active SNP...

39 ElDorado promoter sequence retrieval
from here you can directly run a MatInspector analysis for this promoter... ...again,play around with the interactive graphics... Click the symbols and jump right into MatBase, the TF knowledge base..

40 ElDorado promoter sequence retrieval
now, finally the first way to extract a promoter sequence ... ...and/or any other element displayed in the list below. Choose your desired length. Unless you have good reason to change the length of the proximal promoter, leave the defaults!

41 ElDorado promoter sequence retrieval
This shows you all annotated alternative transcripts plus all Affymetrix probe set single probe mappings plus another way to extract your promoter sequence(s)

42 ElDorado promoter sequence retrieval
You know this already... Three different known transcripts for this locus... ... and four distinct promoters ! How this comes, I´ll tell you in a minute

43 ElDorado promoter sequence retrieval
Tick the promoter of your interest... Or submit sequences directly to one of those tasks. But they make sense only with multiple sequences. More on that later! Or submit the promoter directly to MatInspector for graphical analysis. It works on a single sequence, too. ...choose format... ...and extract the sequence.

44 ? ElDorado promoter sequence retrieval
But why do I have four promoters here? And two even don´t have a transcript assigned, as it is written here! And what´s all that CompGen thing about? The multiple promoter thing I showed you before. Remember the GCK example, liver and pancreas? Now to the CompGen promoters. They are derived by a proprietary comparative genomics approach.

45 ElDorado promoter sequence retrieval

46 ElDorado promoter sequence retrieval
The tick-boxes you know already... We need them for later promoter retrieval. For our example we have an homologous locus assigned in chimp, macaca, human, rat, dog, cow, opossum, chicken, and zebrafish. Note the Promoter Set number ! Exhaustive cross-mapping of all transcripts to all genomes of all organisms in ElDorado generates our homology groups.

47 ElDorado promoter sequence retrieval
Get a feeling for the degree of phylogenetic conservation of the resp. promoter. See how much experimental evidence supports this promoter

48 ElDorado promoter sequence retrieval
Promoter Set represents phylogenetically conserved promoters You should be familiar with this view, now. Here the orange indicates a promoter belonging to a promoter set. With these tick-boxes you can switch on and off the display of the different Promoter Sets

49 ? ElDorado promoter sequence retrieval Don´t waste my time here!
How do I get my promoter sequence now? And which one of all those promoters should I take ? Well, which one? If you do not have any other information (experimental or from literature), I would recommend that you consider all available alternative promoters for further analysis

50 ? ElDorado promoter sequence retrieval Don´t waste my time here!
How do I get my promoter sequence now? And which one of all those promoters should I take ? Two easy ways of promoter sequence retrieval by two mouse clicks I showed you some minutes ago. There are more... oh... you cannot access these options?

51 ElDorado promoter sequence retrieval
You should license GenomatixSuite with at least the 10-fold evaluation account upgrade. Otherwise it is slightly more cumbersome... ... and use that for sequence retrieval from your second to Genomatix favorite system, e.g. NCBI Use one of the options I showed you before and get Contig and positional information... Hint: If you are interested in the TF results rather than the sequence, use the “search for common transcription factor binding sites” option as shown before.

52 ? From physical to functional TF site Quite interesting…
But I am not a single step closer to the answer of my real question: What are the potential TF sites involved in regulation of my gene of interest ? Well, I think you are. Essential first step is to analyze the right sequence in a length that allows for meaningful results. Now that you have the real promoter sequence(s), let´s see how to go on from here...

53 ? From physical to functional TF site
Then we have to look for additional evidence that some of the physical TF sites might be functional ones. Best would be to go for a ChromatinIP experiment. However, for such you would need some hints for which TF to make or buy antibodies. Further computer analysis is required anyhow! There are three different roads to go... The ideal situation for determining potential functional binding sites would be to have a set of genes apparently being co-regulated in the given cellular and experimental context, f.i. from a microarray experiment. A comparative promoter analysis with FrameWorker would very likely give you a pattern of involved TFs, as shown in numerous publications (see our web site at “About us -> Publications”). ? But I have only a single gene. And that´s the one I am interested in!

54 ? From physical to functional TF site
We talked about promoter modules before. Search your sequence for promoter modules with ModelInspector. Our Promoter Module Library contains over 550 promoter modules, each of them experimentally verified to carry transcriptional regulatory activity. A module match increases probability that an involved TF site is functional. ? Okay, how do I do this? Let´s go ! Look for phylogenetically conserved patterns of TF sites in a comparative genomics promoter set with FrameWorker. TFs being part of such phylogenetically conserved frameworks carry higher probability for being functional. Do extensive literature data mining with BiblioSpherePE for known TF correlations, pathway analysis and gene set creation for comparative promoter analysis. TFs showing biological activity in another experimental context are functional (at least in that context).

55 ElDorado promoter sequence retrieval
Lets start with an analysis for promoter modules...

56 Search for promoter modules
If you are licensed, you can have a quick look at the promoter module library. Each module is experimentally verified to carry regulatory activity.

57 Search for promoter modules
Choose a sequence file from your directory Or copy & paste a raw sequence here. or… you know the rest ! Don´t click anything below, unless you want to scan an entire data base !

58 Search for promoter modules
go for vertebrate modules... Click here! You can wait for the result…

59 Search for promoter modules

60 Search for promoter modules

61 … Search for promoter modules Wow! That´s impressive!
Now we have focused down to 21 very interesting positions in this promoter with modules that are composed of a total of 11 different transcription factor binding sites. Our arbitrary chosen example HMGCS1 belongs to the cholesterol biosynthesis pathway. Some of the found promoter modules do have proven function in sterol regulation! Wow! That´s impressive! But that example is a mock-up, isn´t it? Not really. It is a nice example to show this approach. Very frequently one finds functionally related modules. However, there is no guarantee… It adds just another line of evidence.

62 ? Phylogenetically conserved frameworks
That´s right. For this approach you first need a set of phylogenetically conserved promoters. Remember several slides before ? ? Okay, how does the other thing help? How did you call it, phylogenetically conserved frameworks? Not really. It is a nice example to show this approach. Very frequently one finds functionally related modules. However, there is no guarantee… It adds just another line of evidence.

63 ElDorado promoter sequence retrieval
and tick the promoters of one set. In this example I choose Promoter Set 3 for human, rat, dog and cow. Inspect and choose your Promoter Set... ...scroll to the top of the page...

64 … Phylogenetically conserved frameworks Great !
...scroll down... Great ! That is what I really want to know: Which TF sites do they have in common? From here you can have a look at TF binding sites which are common to the input promoters

65 … Phylogenetically conserved frameworks Be careful !! Great !
That is what I really want to know: Which TF sites do they have in common? This is not more than a tiny hint! I can show you many cases where totally unrelated exons do have more TF sites in common than closely co-regulated promoters. What you are really looking for is a conserved pattern of TF sites. And we are going to do so. But first let´s have a look on the nucleotide sequence level... Be careful !!

66 Phylogenetically conserved frameworks
DiAlign TF gives an overlay of a true multiple sequence alignment (not pairwise) and common TF sites. Check DiAlign for other sequences (including amino acids)! It is extremely fast and especially powerful for finding short homologies in largely unrelated sequences.

67 Phylogenetically conserved frameworks
The parameters should be self explanatory. You can always click for help

68 Phylogenetically conserved frameworks
Here an output example.

69 ? Phylogenetically conserved frameworks Why did you do this?
What does it tell me? It is pretty informative to get a feeling for the degree of homology, which parts are more conserved than others and which TF binding sites reside in the homologous parts. Then, it is of interest to see where the evolutionary pressure was rather on functional conservation (TFBS) than on sequence conservation.

70 ? Phylogenetically conserved frameworks Why did you do this?
What does it tell me? Then, if you do a framework analysis on two highly homologous sequences we run into a combinatorial explosion. FrameWorker checks for it and might give you a warning. However, in this case everything is fine...

71 … Phylogenetically conserved frameworks Why did you do this?
What does it tell me? If you do a framework analysis on two highly homologous sequences we run into a combinatorial explosion. FrameWorker checks for it and might give you a warning. However, in this case everything is fine... Now, we finally go to the FrameWorker analysis!

72 Phylogenetically conserved frameworks
This filter is a positive filter! Only TFs known to be associated with a tissue are listed here. A TF not listed in a certain tissue does NOT mean that it is not expressed there! It just has not been reported, yet. Here you can select for TFs only, known to be associated with certain tissues. Here you can choose the matrix library

73 Phylogenetically conserved frameworks
More options gives you... Don´t change those parameters unless you know exactly what you are doing ! ...well, more options !

74 Phylogenetically conserved frameworks
If you know that a certain TF is involved in the regulation of your gene, make it a mandatory element and search only for frameworks containing such. Mandatory elements are most helpful in focusing your analysis. If you don´t know one a priory, I´ll show you later in BiblioSpherePE how to get to those. Toggle multiple choices by holding the "Ctrl" key when clicking! This decides the number of input sequences which have to show a common pattern of TF sites This sets the distance constraints between two adjacent TF sites. More important than the absolute distance is the distance variance. Always start at default values (unless you know already better) and relax gradually if nothing meaningful is found. One word on this parameter. It decides the minimum/maximum number of TF sites being allowed in one framework. In this case I increased the default value from 6 up to 10 since we want to identify the largest conserved pattern in this phylogenetic promoter set. We might lower this later. And always think about the HELP pages ! This option gives you an idea of the specificity of the found frameworks. It checks how often a framework would be found in a background of random human promoter sequences. Use it with care! It slows down FrameWorker considerably!

75 Phylogenetically conserved frameworks
The longest frameworks contain 8 TF sites. There are 4 different frameworks. If you click the link, you jump direct to the graphical representation All four promoters have 18 TF sites in common. This number might differ from the „search for common TF“ job earlier, since now we take strand specificity into account.

76 Phylogenetically conserved frameworks
Here you see the detailed description of the framework. It is perfectly conserved throughout the species You can save this framework in your personal directory for subsequent sequence or database scans Here you have a graphical representation. You already know how this works... Scroll down to the bottom of the page...

77 ? Phylogenetically conserved frameworks Why should I do this?
At the bottom of the output you find this list. Now we not only have identified the TFs but also the exact positions which are worth a closer look. You can scan with your saved frameworks all of our promoter databases for promoters with similar organization. ? Why should I do this? Would this give me additional information ?

78 ? Phylogenetically conserved frameworks Why should I do this?
In this example with an 8 element framework and almost no distance variation between the TF sites, you will find exactly 1 match in over human promoters: the input gene. How to use this approach with less selective frameworks for identification of similarly organized promoters? I'll show you later… Why should I do this? Would this give me additional information ? ?

79 ? Knowledge based analysis Fine!
Yes. The third is knowledge driven and bases on a combination of literature data mining, sequence analysis and pathway/network analysis. For this you need first to download and install the Java client of BiblioSpherePE ? Fine! I think I have seen now two strategies. You mentioned three?

80 Knowledge based analysis

81 Knowledge based analysis
For more detailed introduction to BiblioSpherePE please have a look at

82 Choose "single gene" here...
Knowledge based analysis ...un-tick this box... We are interested in the full network around our gene, not only the connected transcription factors Choose "single gene" here... HMGCS1

83 Knowledge based analysis

84 Knowledge based analysis
This sets the context sensitive filter stringency. The most stringent including computer based semantic analysis is an ordered Gene1 – function word – Gene2 level (B3). (B4) shows expert curated gene-gene relationships only. Expert knowledge is derived by different sources, like Genomatix experts, Molecular Connection´s NetPro data base, STKE, etc... Click around, and see what happens ! Here you have a list of all other genes, being connected to your input gene by at least one co-citation in entire PubMed on abstract level

85 Knowledge based analysis
I have intentionally chosen an example with no expert curation available, since I want to demonstrate how to generate new knowledge! This filters the co-citation frequency

86 Knowledge based analysis
Here you see the network around HGMCS1, all other genes connected on GFG level

87 Knowledge based analysis
Here connected transcription factors only on GFG level.

88 Knowledge based analysis
Now all connected transcription factors.

89 Knowledge based analysis
A connection line between two genes means that there is a bibliographic connection on abstract level (BO)...

90 "Mouse over" and clicking gives you more information...
Knowledge based analysis "Mouse over" and clicking gives you more information...

91 Knowledge based analysis
The green indicates that there is a binding site for SREBF1 (V$SREB) in at least one of the promoters of HMGCS1

92 There is more encoded in the connection lines...
Knowledge based analysis There is more encoded in the connection lines...

93 Knowledge based analysis
The little symbols give you some information about the gene and its association with pathways

94 Some more helpful options from this page...
Knowledge based analysis The tagged text tells us that the TF SREBF1 is involved in regulation of HMGCS1 Some more helpful options from this page...

95 You can get all info about any gene you click up there...
Knowledge based analysis This you know already... You can get all info about any gene you click up there... over here...

96 Knowledge based analysis
..as well as this.

97 … Knowledge based analysis Hey, hey hey ! Stop it !
..as well as this. Hey, hey hey ! Stop it ! I want to know about the regulation of my gene, not to play around with your Biblio...thing!

98 … Knowledge based analysis Hey, hey hey ! Stop it !
I want to know about the regulation of my gene, not to play around with your Biblio...thing! BiblioSphere PathwayEdition ! We already found TFs of interest, known to be involved in regulation of our gene. Now let´s see the biological environment of our gene and find a group of related genes which might share some regulatory motifs. Let´s go back and display all genes contained in this network...

99 Knowledge based analysis
Let´s load the GO-Filter "biological process"...

100 Go to the table view by this tab...
Knowledge based analysis Go to the table view by this tab... Here you see the tree for the selected filter. Expand and collapse by clicking on the +/-

101 Knowledge based analysis
The Z-Score gives you a measure whether certain categories are significantly over- or under-represented by the displayed gene set. Top scoring is sterol and cholesterol metabolism... Everything above 3 is statistically significant! Clicking here opens the tree on the left and highlights the category as well as the resp. genes in the pathway view.

102 Knowledge based analysis
This finally applies the filter to your gene set. Superimpose as many filters as you´d like !

103 Knowledge based analysis
We see two TFs in here, SREBF1 and SREBF2, both Sterol Regulatory Element Binding Protein factors. The "redraw" button Double-click on SREBF1 in order to see all connections to that TF

104 Knowledge based analysis
Another table view...

105 Knowledge based analysis
...the colors encode for... Highlight those genes with your mouse, and copy them...

106 Knowledge based analysis
Now we have expanded our single input gene with a set of seven additional genes! And we know already quite a lot about them! They all are connected with my original gene in PubMed All genes, with very high high statistical significance, belong to the GO-category "Cholesterol Metabolic Process" SREB transcription factors seem to play a role in the regulation of those genes Now lets check whether the promoters of those genes share a complex framework. For such we first need to export those genes into GenomatixSuite´s Gene2Promoter

107 ? Back to sequence level Oh my god... more...
Where do I find this now ? Relax ! It´s easy and not far away...

108 Back to sequence level APOA1, LDLR, SREBF2, VLDLR, FDFT1, FDPS, MVK, HMGCS1 Paste here the gene symbols which we just copied in BiblioSpherePE Don´t forget this ! Otherwise you will be asked for all findings in all organisms.

109 Back to sequence level

110 ? Back to sequence level Hey stop ! Haven´t I seen this before ?
You are right! It pretty much is the same display as the comparative genomics page which we have generated earlier. The difference in this case is that we now compare promoters of different genes within one organism…

111 ? Back to sequence level 9.216 combinations possible
Eight loci with 26 different unique promoters ! 9.216 combinations possible for exhaustive analysis! Combinatorial explosion ! ? How should I know which ones? How do I do this ? Since we are concentrating on SREB TF-sites, let´s concentrate on those promoters which contain an V$SREB binding site. We have to find a way to circumvent this Very easy! Just scroll down to the bottom of the page...

112 Back to sequence level Select the desired TF-matrix family here

113 Back to sequence level ...and all relevant promoters are checked already for you Now we have reduced to 12 different promoters from 8 different loci, each containing at least one SREB site.

114 Back to sequence level Scroll to the bottom of the Gene2Promoter result page... We have done this before...

115 Back to sequence level You see?
Now we have tolerable combinatorics and can perform an exhaustive promoter analysis.

116 Back to sequence level ...but now we choose V$SREB as a mandatory element for our framework. Hint: you can select multiple elements by holding the "Ctrl" key while clicking. ...and with these parameters you have to play around a little bit. Start at default. Gradually relax stringency. Go down in Quorum Constraint step by step, or allow for higher distance variance (e.g. 20, 30, 40, 50, usw...) The lower the distance variance and the more elements per model, the higher is the resulting model selectivity. Remember? We have been here before, too...

117 Back to sequence level Tick the boxes of the models for subsequent database search for other promoters with similar organization. With 6 elements I expect to find the 3 genes from which this models were derived only: SREBF2, HMGCS1, and MVK There are frameworks with 6 elements! This is quite significant and expected to be extremely selective. For example, at quorum of 30%, allowed distance range of 5 to 200 bp, distance variance of 50 bp maximum elements allowed: 10 we find quite a lot of frameworks in the different promoter combinations.

118 Back to sequence level Now lets see how selective this model is...
Scroll all the way down... This list is quite interesting! Here we have the differents TF sites in this set of frameworks. This list represents those TFs which we should concentrate on, when analyzing the regulation of the original input gene. It is pretty comparable to the list from our phylogenetic approach before. There is now good evidence that those factors play a role in regulation in the biological context of cholesterol metabolism.

119 Back to sequence level It is just one click away...

120 Back to sequence level This should look familiar to you !
But now we are going for the database section... Unless you have a good reason to do so, always go for the database of promoters of annotated genes. This allows for GO-group Z-scoring of the database hits later on...

121 Back to sequence level This is a termination parameter.
If this number of hits is reached before the end of the database, the search is terminated Careful! Some browsers crash with too many hits to display in HTML ! (>10.000) A database search usually takes several minutes. In order to avoid a server time-out go for the option. You´ll receive a mail with a direct link to your result file ( it will be kept in your "Results Directory", too)

122 Back to sequence level Eight matches! In four sequences.
Each model matches exactly once per sequence... The three genes of our "training set"... ...out of a total of different promoters Wow !

123 Back to sequence level ...plus one additional "new" gene!
This one was not in our input list and is identified only by common promoter organization!

124 Back to sequence level Those four genes now are extremely likely to share common regulation in the given biological context! The TFs in the framework now are the top candidates for further inspection.

125 now there are too many slides !!
Back to sequence level Those four genes now are extremely likely to share common regulation in the given biological context! The TFs in the framework now are the top candidates for further inspection. STOP !! First I had too many matches in MatInspector, now there are too many slides !!

126 New Knowledge I am terribly sorry for that!
However, eukaryotic transcriptional regulation is pretty complex. Our group of researchers works in this field since more than two decades. As you have seen, our tools - though pretty easy to use - require some explanations and sometimes a slightly different mind-setting, going beyond looking at single, isolated TF binding sites. I hope I was able to show you some basic strategies to follow. Nevertheless, lets have a final look at the additional gene which we have found with the database search in our example...

127 New Knowledge

128 New Knowledge

129 methylmalonyl aciduria
New Knowledge MMAB is a transferase involved in vitamin B(12) activation and linked to a disease: methylmalonyl aciduria

130 shows that they are all connected
New Knowledge Feeding all 4 genes from ModelInspector into BiblioSpherePE shows that they are all connected plus...

131 In our example, we started with a single gene ( HMGCS1),
ElDorado put it into biological context in and concentrated on an potential regulator ( SREB), BiblioSpherePE identified common promoter organization (TF-Framework) GEMS Launcher , FrameWorker searched for additional genes with similar promoter organization and GEMS Launcher , ModelInspector put the genes back into biological context. Literature confirmed that we indeed found a co-regulated network and identified the molecular basis for such. This could NEVER be achieved by statistical analysis of isolated TFBS

132 . There is so much more in GenomatixSuite PE I did neither say a word to matrix generation, nor to direct experimental planning for knock-out/knock-in experiments with SequenceShaper Expand the hit-list by shortening the framework, etc... etc... Get in touch with us via and we will give you a tour through the entire system at a web-meeting. Some informative links:


Download ppt "Too many matches…."

Similar presentations


Ads by Google