Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2007 Genomatix Software GmbH - Too.

Similar presentations


Presentation on theme: "© 2007 Genomatix Software GmbH - Too."— Presentation transcript:

1 © 2007 Genomatix Software GmbH - Too many matches…

2 © 2007 Genomatix Software GmbH - “Let´s run MatInspector over the promoter region of my gene” A typical question: A typical approach: What are the potential TF sites involved in regulation of my gene of interest ? Too many matches…

3 © 2007 Genomatix Software GmbH - “Let´s extract from NCBI. 3kb upstream of TSS to be sure to have the promoter…” A typical question: A typical approach: Where do I get my input promoter DNA sequence from? Too many matches…

4 © 2007 Genomatix Software GmbH - A typical result: Too many matches… Which of those matches are relevant? How do I get rid of all those “false positives” ?

5 © 2007 Genomatix Software GmbH - Important facts to consider: There is not a single false positive match MatInspector gives you all physical TF binding sites A single isolated TF binding site carries no function TFs work through complexes which are represented on sequence level through sets of TF binding sites in certain distance relationship and orientation ->promoter frameworks A physical TFBS is found every 10 to 15 bps throughout the genome TF binding sites…

6 © 2007 Genomatix Software GmbH - Okay, what is now a physical TF binding site ? What is a functional TF binding site? TF binding sites…

7 © 2007 Genomatix Software GmbH Physical binding sites have no function in transcription on their own A physical binding site is invariable A physical binding site is a fixed part of the genome This DNA sequence usually can bind to its cognate protein(s) Physical binding sites can be detected by MatInspector = weight matrix / IUPAC string False positives?

8 © 2007 Genomatix Software GmbH Transcriptional function is defined by the cellular and genomic context One binding site, five cell types......but binding proteins are present only in 2 cell types! -> no functional binding site in the other 3 cell types! A functional binding site depends on context! A functional binding site requires a cellular context A functional binding site requires a genomic context Even when binding proteins are present......biological function may require additional binding sites! Module Physical vs functional TFBS

9 © 2007 Genomatix Software GmbH A transcriptional module is the smallest functional unit A transcriptional module consists of two or more TFBSs Strand orientation, relative order and distance of TFBSs are important A module also has a strand orientation and can shift within a promoter F1 +F2 -F3 +/- Transcriptional modules are present in promoters and enhancers Transcriptional modules TATA box INR box The core promoter - just another module Transcriptional modules integrate signals via the interacting TFs

10 © 2007 Genomatix Software GmbH No common organization?Common modules! ABC ABC AB C Why uses nature modules?

11 © 2007 Genomatix Software GmbH Promoter modules can work in three different ways High / Low Is possible High / Low Is possible High / Low Is possible High / High only Binding Affinity: Synergistic “Composite elements” Antagonistic or Synergistic “Short range module” distance ≤ 50 bp “Looping module” distance up to 300bp “Short range module” distance ≤ 50 bp Transcriptional modules

12 © 2007 Genomatix Software GmbH Modules are the basic elements of regulatory pathways and networks Transcriptional modules define target genes of pathways NFkappaB regulates a number of “target genes” NFkappaB IL-6 IL-8 ICAM-1 SAA-1 SAA-2 ELAM-1 IFN-ß IP-10 G-CSF IL-2 HLA-A HLA-BIL-1E-Selectin C/EBP CREB IRF-1 NFkappaB NFkB CREB NFkB C/EBP NFkB IRF-1NFkB NFkappaB is involved in regulation of target genes of several pathways Induced by 2 pathways ! Transcriptional modules

13 © 2007 Genomatix Software GmbH - Key – lock principle TFIIB TFIIE TFIIH IN R TFIID TFIIF RNA polymerase II TFIIA TBP proximal promoter core promoter distal promoter/ enhancer TF binding sites „DNA-looping“ TATA TF binding sites Transcription factor binding sites Protein complex binding Transcriptional modules

14 © 2007 Genomatix Software GmbH - Transcription regulation mechanism Gene A, transcript n Transcription regulation implies a regulatory network Protein complex ExonPromoter Gene B, transcript p Gene C, transcript m Primary transcript Transcriptional modules

15 © 2007 Genomatix Software GmbH - Context dependent expression by different protein complexes Same lock – different keys: Same gene - different biological context Transcriptional modules

16 © 2007 Genomatix Software GmbH - Context specific transcription regulation Example: Analysis of the RANTES promoter in different cell lines Transcriptional modules Fessele, S., Maier, H., Zischek, C., Nelson, P.J., Werner, T. (2002) "Regulatory context is a crucial part of gene function" Trends in Genetics 18, (MEDLINE ) Experimentally verified evidence that TFBSs from modules, which are crucial for regulation in one biological context (cell type), are totally irrelevant in another !

17 © 2007 Genomatix Software GmbH Module matches reduce experimental efforts by orders of magnitude Modules contribute strongly to functional promoter analysis Modules are usually linked to at least one known biological function A module match in a promoter makes this gene a good candidate Additional independent evidence is required to prove the target A module match in a promoter does not prove the gene to be a target A module match immediately suggests experimental verification Transcriptional modules

18 © 2007 Genomatix Software GmbH - Very interesting – but how does all this help me with my original question ? The question still is: What are the potential TF sites involved in regulation of my gene of interest ? Promoter sequences

19 © 2007 Genomatix Software GmbH - More things to consider before asking that question ! There was another one: Promoter sequences “Let´s extract from NCBI. 3kb upstream of TSS to be sure to have the promoter…” Where do I get my input promoter DNA sequence from?

20 © 2007 Genomatix Software GmbH - More things to consider … 3 kb is too large for meaningful analysis even going 10kb upstream of TSS is no guarantee to have the relevant promoter sequence multiple promoters are the rule, not an exeption the non-coding first exon is always part of the promoter Huh? What does this mean ? Where do I get this damn promoter now? Promoter sequences

21 © 2007 Genomatix Software GmbH - Which promoter? One gene = one promoter ? Genes usually have alternative transcripts with alternative promoters Gene A? Alternative transcripts/promoters

22 © 2007 Genomatix Software GmbH - Context dependent expression via different promoters Example: Glucokinase Coding exons Hepatic promoter Pancreatic promoter Y Tanizawa, A Matsutani, KC Chiu, and MA Permutt Human glucokinase gene: isolation, structural characterization, and identification of a microsatellite repeat polymorphism Mol. Endocrinol., Jul 1992; 6: Alternative transcripts/promoters

23 © 2007 Genomatix Software GmbH - Comparative genomic map of the Glucokinase GCK Promoter set 1 Pancreatic promoter Data from ElDorado Alternative transcripts/promoters Promoter set 2 Hepatic promoter

24 © 2007 Genomatix Software GmbH - Important facts to consider: Alternative promoter usage is often tied to regulation of tissue specific gene expression Alternative promoter usage is of very high biological relevance. There are several examples where aberrant regulation of the identical primary transcript leads to severe biological effects Alternative transcripts/promoters

25 © 2007 Genomatix Software GmbH Aromatase: Switch in promoter usage is associated with disease f IIIIIIVVVIVIIVIIIIXX AATAAA Normal breastBreast cancer Aromatase The gene product is absolutely identical. The only difference is in the alternative promoter usage. On transcript level this can be seen only in the non-coding first exon. Alternative transcripts/promoters

26 © 2007 Genomatix Software GmbH The aim of in silico promoter analysis - summary context 1 context 2 context 3 : context n 1. Identification of the promoter sequence 2. Prediction of physical transcription factor binding sites 3. Functional context 4. Context dependent functional transcription factor binding sites Promoter Analysis

27 © 2007 Genomatix Software GmbH - Yes! I know all of this! I just wanted to know from where I can get my promoter sequence(s) easily! ElDorado promoter sequence retrieval … If you don´t have one already, sign up for a free evaluation account. first then login here!

28 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval

29 © 2007 Genomatix Software GmbH - Either enter here the locus ID, or the gene name…or choose a sequence file from your directory...… or copy & paste a raw sequence here. It can be cdNA or whatever you have. It will be exactly mapped to the genomes within seconds. Upload a file from your local disk…...accession number…… or exact contig position ElDorado promoter sequence retrieval Choose the organism.

30 © 2007 Genomatix Software GmbH - HMGCS1 ( for example) ElDorado promoter sequence retrieval Input in this section delivers results based on gene name or keyword search. Over a million of names, synonyms and gene IDs help to find what you want - fast! IMPORTANT! Affymetrix probe-set-ID input : Our annotation is NOT based on the Affymetrix NetAffx assignment!It is rather based on genomic mapping of each single probe. A transcript will be retrieved if at least one probe of the set (usually 11 probes) matches. For mixed probe sets (cross-hybridisation), all relevant transcripts will be retrieved, which might lead to a result with transcripts from different loci. Input in this section delivers results based on ultra fast sequence mapping. Copy and paste raw sequence data here (min.15 nucleotides) or enter an accession number. In contrast to the entry of an accession number above, here the sequence is actually retrived from data base and mapped onto the genome(s). NOTE: many EST based accession numbers have poor sequence homology and deliver no result.

31 © 2007 Genomatix Software GmbH - … here you can choose which chip´s probes to see... ElDorado promoter sequence retrieval … licensed customers can add their own sequence data

32 © 2007 Genomatix Software GmbH - This gives you an interactive graphical representation of the genomic context of your gene ElDorado promoter sequence retrieval

33 © 2007 Genomatix Software GmbH - Everything is clickable – just play around ! ElDorado promoter sequence retrieval Here you can scale the view switch display of components on and off mapping positions of Affymetrix single probes ! scale/slide the retrieved genomic "window" Orange indicates your input. In this case a gene name. It is very informative when your query is based on sequence data. Then you see the mapping positions. select regions of the graphics and safe them into a file

34 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval Now we have zoomed into the promoter region Clicking on this trancriptional start region (TSR)......displays this hyperlink to...

35 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval...this profile of the different experimentally verified TSS (CAGE tags) in the different tissue types.

36 © 2007 Genomatix Software GmbH - This is a table-like representation of all annotated elements. It is especially useful for quick and easy retrieval of the dna sequence(s) of interest. ElDorado promoter sequence retrieval

37 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval Tick/un-tick the boxes of what you would like to see, and then...

38 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval This for instance......tells you that this SNP deletes three potential TF binding sites and creates a new one. A potential regulatory active SNP...

39 © 2007 Genomatix Software GmbH - from here you can directly run a MatInspector analysis for this promoter......again,play around with the interactive graphics... Click the symbols and jump right into MatBase, the TF knowledge base.. ElDorado promoter sequence retrieval

40 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval now, finally the first way to extract a promoter sequence......and/or any other element displayed in the list below. Choose your desired length. Unless you have good reason to change the length of the proximal promoter, leave the defaults!

41 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval This shows you all annotated alternative transcripts plus all Affymetrix probe set single probe mappings plus another way to extract your promoter sequence(s)

42 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval You know this already... Three different known transcripts for this locus and four distinct promoters ! How this comes, I´ll tell you in a minute

43 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval Tick the promoter of your interest......choose format......and extract the sequence. Or submit the promoter directly to MatInspector for graphical analysis. It works on a single sequence, too. Or submit sequences directly to one of those tasks. But they make sense only with multiple sequences. More on that later!

44 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval But why do I have four promoters here? And two even don´t have a transcript assigned, as it is written here! And what´s all that CompGen thing about? The multiple promoter thing I showed you before. Remember the GCK example, liver and pancreas? Now to the CompGen promoters. They are derived by a proprietary comparative genomics approach.

45 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval

46 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval Exhaustive cross-mapping of all transcripts to all genomes of all organisms in ElDorado generates our homology groups. The tick-boxes you know already... We need them for later promoter retrieval. Note the Promoter Set number ! For our example we have an homologous locus assigned in chimp, macaca, human, rat, dog, cow, opossum, chicken, and zebrafish.

47 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval Get a feeling for the degree of phylogenetic conservation of the resp. promoter. See how much experimental evidence supports this promoter

48 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval You should be familiar with this view, now. Here the orange indicates a promoter belonging to a promoter set. With these tick- boxes you can switch on and off the display of the different Promoter Sets A Promoter Set represents phylogenetically conserved promoters

49 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval Don´t waste my time here! How do I get my promoter sequence now? And which one of all those promoters should I take ? Well, which one? If you do not have any other information (experimental or from literature), I would recommend that you consider all available alternative promoters for further analysis

50 © 2007 Genomatix Software GmbH - Two easy ways of promoter sequence retrieval by two mouse clicks I showed you some minutes ago. There are more... ElDorado promoter sequence retrieval Don´t waste my time here! How do I get my promoter sequence now? And which one of all those promoters should I take ? oh... you cannot access these options?

51 © 2007 Genomatix Software GmbH - You should license GenomatixSuite with at least the 10-fold evaluation account upgrade. Otherwise it is slightly more cumbersome... ElDorado promoter sequence retrieval Use one of the options I showed you before and get Contig and positional information and use that for sequence retrieval from your second to Genomatix favorite system, e.g. NCBI Hint: If you are interested in the TF results rather than the sequence, use the “search for common transcription factor binding sites” option as shown before.

52 © 2007 Genomatix Software GmbH - From physical to functional TF site Quite interesting… But I am not a single step closer to the answer of my real question: What are the potential TF sites involved in regulation of my gene of interest ? Well, I think you are. Essential first step is to analyze the right sequence in a length that allows for meaningful results. Now that you have the real promoter sequence(s), let´s see how to go on from here...

53 © 2007 Genomatix Software GmbH - The ideal situation for determining potential functional binding sites would be to have a set of genes apparently being co-regulated in the given cellular and experimental context, f.i. from a microarray experiment. A comparative promoter analysis with FrameWorker would very likely give you a pattern of involved TFs, as shown in numerous publications (see our web site at “About us -> Publications”). Then we have to look for additional evidence that some of the physical TF sites might be functional ones. Best would be to go for a ChromatinIP experiment. However, for such you would need some hints for which TF to make or buy antibodies. Further computer analysis is required anyhow! There are three different roads to go... From physical to functional TF site But I have only a single gene. And that´s the one I am interested in!

54 © 2007 Genomatix Software GmbH - We talked about promoter modules before. Search your sequence for promoter modules with ModelInspector. Our Promoter Module Library contains over 550 promoter modules, each of them experimentally verified to carry transcriptional regulatory activity. A module match increases probability that an involved TF site is functional. Look for phylogenetically conserved patterns of TF sites in a comparative genomics promoter set with FrameWorker. TFs being part of such phylogenetically conserved frameworks carry higher probability for being functional. Do extensive literature data mining with BiblioSpherePE for known TF correlations, pathway analysis and gene set creation for comparative promoter analysis. TFs showing biological activity in another experimental context are functional (at least in that context). From physical to functional TF site Okay, how do I do this? Let´s go !

55 © 2007 Genomatix Software GmbH - ElDorado promoter sequence retrieval Lets start with an analysis for promoter modules...

56 © 2007 Genomatix Software GmbH - Search for promoter modules If you are licensed, you can have a quick look at the promoter module library. Each module is experimentally verified to carry regulatory activity.

57 © 2007 Genomatix Software GmbH - Choose a sequence file from your directory Or copy & paste a raw sequence here. or… you know the rest ! Don´t click anything below, unless you want to scan an entire data base ! Search for promoter modules

58 © 2007 Genomatix Software GmbH - go for vertebrate modules...Click here! You can wait for the result… Search for promoter modules

59 © 2007 Genomatix Software GmbH - Search for promoter modules

60 © 2007 Genomatix Software GmbH - Search for promoter modules

61 © 2007 Genomatix Software GmbH - Not really. It is a nice example to show this approach. Very frequently one finds functionally related modules. However, there is no guarantee… It adds just another line of evidence. Search for promoter modules Now we have focused down to 21 very interesting positions in this promoter with modules that are composed of a total of 11 different transcription factor binding sites. Our arbitrary chosen example HMGCS1 belongs to the cholesterol biosynthesis pathway. Some of the found promoter modules do have proven function in sterol regulation! … Wow! That´s impressive! But that example is a mock-up, isn´t it?

62 © 2007 Genomatix Software GmbH - Not really. It is a nice example to show this approach. Very frequently one finds functionally related modules. However, there is no guarantee… It adds just another line of evidence. Okay, how does the other thing help? How did you call it, phylogenetically conserved frameworks? That´s right. For this approach you first need a set of phylogenetically conserved promoters. Remember several slides before ? Phylogenetically conserved frameworks

63 © 2007 Genomatix Software GmbH - Inspect and choose your Promoter Set... ElDorado promoter sequence retrieval...scroll to the top of the page... and tick the promoters of one set. In this example I choose Promoter Set 3 for human, rat, dog and cow.

64 © 2007 Genomatix Software GmbH - From here you can have a look at TF binding sites which are common to the input promoters Phylogenetically conserved frameworks...scroll down... … Great ! That is what I really want to know: Which TF sites do they have in common?

65 © 2007 Genomatix Software GmbH - Be careful !! Phylogenetically conserved frameworks … Great ! That is what I really want to know: Which TF sites do they have in common? This is not more than a tiny hint! I can show you many cases where totally unrelated exons do have more TF sites in common than closely co-regulated promoters. What you are really looking for is a conserved pattern of TF sites. And we are going to do so. But first let´s have a look on the nucleotide sequence level...

66 © 2007 Genomatix Software GmbH - DiAlign TF gives an overlay of a true multiple sequence alignment (not pairwise) and common TF sites. Check DiAlign for other sequences (including amino acids)! It is extremely fast and especially powerful for finding short homologies in largely unrelated sequences. Phylogenetically conserved frameworks

67 © 2007 Genomatix Software GmbH - The parameters should be self explanatory. You can always click for help Phylogenetically conserved frameworks

68 © 2007 Genomatix Software GmbH - Here an output example. Phylogenetically conserved frameworks

69 © 2007 Genomatix Software GmbH - It is pretty informative to get a feeling for the degree of homology, which parts are more conserved than others and which TF binding sites reside in the homologous parts. Then, it is of interest to see where the evolutionary pressure was rather on functional conservation (TFBS) than on sequence conservation. Phylogenetically conserved frameworks Why did you do this? What does it tell me?

70 © 2007 Genomatix Software GmbH - Then, if you do a framework analysis on two highly homologous sequences we run into a combinatorial explosion. FrameWorker checks for it and might give you a warning. However, in this case everything is fine... Phylogenetically conserved frameworks Why did you do this? What does it tell me?

71 © 2007 Genomatix Software GmbH - If you do a framework analysis on two highly homologous sequences we run into a combinatorial explosion. FrameWorker checks for it and might give you a warning. However, in this case everything is fine... Phylogenetically conserved frameworks … Why did you do this? What does it tell me? Now, we finally go to the FrameWorker analysis!

72 © 2007 Genomatix Software GmbH - Here you can select for TFs only, known to be associated with certain tissues. Here you can choose the matrix library Phylogenetically conserved frameworks This filter is a positive filter! Only TFs known to be associated with a tissue are listed here. A TF not listed in a certain tissue does NOT mean that it is not expressed there! It just has not been reported, yet.

73 © 2007 Genomatix Software GmbH - More options gives you... Phylogenetically conserved frameworks...well, more options ! Don´t change those parameters unless you know exactly what you are doing !

74 © 2007 Genomatix Software GmbH - Phylogenetically conserved frameworks Use it with care! It slows down FrameWorker considerably! This decides the number of input sequences which have to show a common pattern of TF sites This sets the distance constraints between two adjacent TF sites. More important than the absolute distance is the distance variance. Always start at default values (unless you know already better) and relax gradually if nothing meaningful is found. If you know that a certain TF is involved in the regulation of your gene, make it a mandatory element and search only for frameworks containing such. Mandatory elements are most helpful in focusing your analysis. If you don´t know one a priory, I´ll show you later in BiblioSpherePE how to get to those. Toggle multiple choices by holding the "Ctrl" key when clicking! This option gives you an idea of the specificity of the found frameworks. It checks how often a framework would be found in a background of random human promoter sequences. One word on this parameter. It decides the minimum/maximum number of TF sites being allowed in one framework. In this case I increased the default value from 6 up to 10 since we want to identify the largest conserved pattern in this phylogenetic promoter set. We might lower this later. And always think about the HELP pages !

75 © 2007 Genomatix Software GmbH - All four promoters have 18 TF sites in common. This number might differ from the „search for common TF“ job earlier, since now we take strand specificity into account. The longest frameworks contain 8 TF sites. There are 4 different frameworks. If you click the link, you jump direct to the graphical representation Phylogenetically conserved frameworks

76 © 2007 Genomatix Software GmbH - Here you have a graphical representation. You already know how this works... You can save this framework in your personal directory for subsequent sequence or database scans Phylogenetically conserved frameworks Scroll down to the bottom of the page... Here you see the detailed description of the framework. It is perfectly conserved throughout the species

77 © 2007 Genomatix Software GmbH - Phylogenetically conserved frameworks At the bottom of the output you find this list. Now we not only have identified the TFs but also the exact positions which are worth a closer look. You can scan with your saved frameworks all of our promoter databases for promoters with similar organization. Why should I do this? Would this give me additional information ?

78 © 2007 Genomatix Software GmbH - Why should I do this? Would this give me additional information ? In this example with an 8 element framework and almost no distance variation between the TF sites, you will find exactly 1 match in over human promoters: the input gene. How to use this approach with less selective frameworks for identification of similarly organized promoters? I'll show you later… Phylogenetically conserved frameworks

79 © 2007 Genomatix Software GmbH - Fine! I think I have seen now two strategies. You mentioned three? Yes. The third is knowledge driven and bases on a combination of literature data mining, sequence analysis and pathway/network analysis. For this you need first to download and install the Java client of BiblioSpherePE Knowledge based analysis

80 © 2007 Genomatix Software GmbH - Knowledge based analysis

81 © 2007 Genomatix Software GmbH - Knowledge based analysis For more detailed introduction to BiblioSpherePE please have a look at

82 © 2007 Genomatix Software GmbH - this box... We are interested in the full network around our gene, not only the connected transcription factors HMGCS1 Knowledge based analysis Choose "single gene" here...

83 © 2007 Genomatix Software GmbH - Knowledge based analysis

84 © 2007 Genomatix Software GmbH - Here you have a list of all other genes, being connected to your input gene by at least one co-citation in entire PubMed on abstract level Knowledge based analysis Click around, and see what happens ! This sets the context sensitive filter stringency. The most stringent including computer based semantic analysis is an ordered Gene1 – function word – Gene2 level (B3). (B4) shows expert curated gene-gene relationships only. Expert knowledge is derived by different sources, like Genomatix experts, Molecular Connection´s NetPro data base, STKE, etc...

85 © 2007 Genomatix Software GmbH - This filters the co-citation frequency I have intentionally chosen an example with no expert curation available, since I want to demonstrate how to generate new knowledge! Knowledge based analysis

86 © 2007 Genomatix Software GmbH - Knowledge based analysis Here you see the network around HGMCS1, all other genes connected on GFG level

87 © 2007 Genomatix Software GmbH - Knowledge based analysis. Here connected transcription factors only on GFG level.

88 © 2007 Genomatix Software GmbH - Knowledge based analysis. Now all connected transcription factors.

89 © 2007 Genomatix Software GmbH - Knowledge based analysis A connection line between two genes means that there is a bibliographic connection on abstract level (BO)...

90 © 2007 Genomatix Software GmbH - Knowledge based analysis "Mouse over" and clicking gives you more information...

91 © 2007 Genomatix Software GmbH - Knowledge based analysis The green indicates that there is a binding site for SREBF1 (V$SREB) in at least one of the promoters of HMGCS1

92 © 2007 Genomatix Software GmbH - Knowledge based analysis There is more encoded in the connection lines...

93 © 2007 Genomatix Software GmbH - The little symbols give you some information about the gene and its association with pathways Knowledge based analysis

94 © 2007 Genomatix Software GmbH - Knowledge based analysis Some more helpful options from this page... The tagged text tells us that the TF SREBF1 is involved in regulation of HMGCS1

95 © 2007 Genomatix Software GmbH - Knowledge based analysis You can get all info about any gene you click up there... over here... This you know already...

96 © 2007 Genomatix Software GmbH - Knowledge based analysis..as well as this.

97 © 2007 Genomatix Software GmbH - Knowledge based analysis..as well as this. … Hey, hey hey ! Stop it ! I want to know about the regulation of my gene, not to play around with your Biblio...thing!

98 © 2007 Genomatix Software GmbH - Knowledge based analysis … Hey, hey hey ! Stop it ! I want to know about the regulation of my gene, not to play around with your Biblio...thing! BiblioSphere PathwayEdition ! We already found TFs of interest, known to be involved in regulation of our gene. Now let´s see the biological environment of our gene and find a group of related genes which might share some regulatory motifs. Let´s go back and display all genes contained in this network...

99 © 2007 Genomatix Software GmbH - Knowledge based analysis Let´s load the GO- Filter "biological process"...

100 © 2007 Genomatix Software GmbH - Knowledge based analysis Here you see the tree for the selected filter. Expand and collapse by clicking on the +/- Go to the table view by this tab...

101 © 2007 Genomatix Software GmbH - Knowledge based analysis The Z-Score gives you a measure whether certain categories are significantly over- or under-represented by the displayed gene set. Top scoring is sterol and cholesterol metabolism... Everything above 3 is statistically significant! Clicking here opens the tree on the left and highlights the category as well as the resp. genes in the pathway view.

102 © 2007 Genomatix Software GmbH - Knowledge based analysis This finally applies the filter to your gene set. Superimpose as many filters as you´d like !

103 © 2007 Genomatix Software GmbH - Knowledge based analysis We see two TFs in here, SREBF1 and SREBF2, both Sterol Regulatory Element Binding Protein factors. The "redraw" button Double-click on SREBF1 in order to see all connections to that TF

104 © 2007 Genomatix Software GmbH - Knowledge based analysis Another table view...

105 © 2007 Genomatix Software GmbH - Knowledge based analysis...the colors encode for... Highlight those genes with your mouse, and copy them...

106 © 2007 Genomatix Software GmbH - Knowledge based analysis They all are connected with my original gene in PubMed Now we have expanded our single input gene with a set of seven additional genes! And we know already quite a lot about them! All genes, with very high high statistical significance, belong to the GO-category "Cholesterol Metabolic Process" SREB transcription factors seem to play a role in the regulation of those genes Now lets check whether the promoters of those genes share a complex framework. For such we first need to export those genes into GenomatixSuite ´s Gene2Promoter

107 © 2007 Genomatix Software GmbH - Oh my god... more... Where do I find this now ? Back to sequence level Relax ! It´s easy and not far away...

108 © 2007 Genomatix Software GmbH - Paste here the gene symbols which we just copied in BiblioSpherePE Don´t forget this ! Otherwise you will be asked for all findings in all organisms. Back to sequence level APOA1, LDLR, SREBF2, VLDLR, FDFT1, FDPS, MVK, HMGCS1

109 © 2007 Genomatix Software GmbH - Back to sequence level

110 © 2007 Genomatix Software GmbH - You are right! It pretty much is the same display as the comparative genomics page which we have generated earlier. The difference in this case is that we now compare promoters of different genes within one organism… Hey stop ! Haven´t I seen this before ? Back to sequence level

111 © 2007 Genomatix Software GmbH - Back to sequence level Eight loci with 26 different unique promoters ! combinations possible for exhaustive analysis! Combinatorial explosion ! We have to find a way to circumvent this Since we are concentrating on SREB TF-sites, let´s concentrate on those promoters which contain an V$SREB binding site. How should I know which ones? How do I do this ? Very easy! Just scroll down to the bottom of the page...

112 © 2007 Genomatix Software GmbH - Back to sequence level Select the desired TF-matrix family here

113 © 2007 Genomatix Software GmbH - Back to sequence level...and all relevant promoters are checked already for you Now we have reduced to 12 different promoters from 8 different loci, each containing at least one SREB site.

114 © 2007 Genomatix Software GmbH - Scroll to the bottom of the Gene2Promoter result page... Back to sequence level We have done this before...

115 © 2007 Genomatix Software GmbH - You see? Now we have tolerable combinatorics and can perform an exhaustive promoter analysis. Back to sequence level

116 © 2007 Genomatix Software GmbH - Remember? We have been here before, too... Back to sequence level...but now we choose V$SREB as a mandatory element for our framework. Hint: you can select multiple elements by holding the "Ctrl" key while clicking....and with these parameters you have to play around a little bit. Start at default. Gradually relax stringency. Go down in Quorum Constraint step by step, or allow for higher distance variance (e.g. 20, 30, 40, 50, usw...) The lower the distance variance and the more elements per model, the higher is the resulting model selectivity.

117 © 2007 Genomatix Software GmbH - Back to sequence level For example, at quorum of 30%, allowed distance range of 5 to 200 bp, distance variance of 50 bp maximum elements allowed: 10 we find quite a lot of frameworks in the different promoter combinations. There are frameworks with 6 elements! This is quite significant and expected to be extremely selective. Tick the boxes of the models for subsequent database search for other promoters with similar organization. With 6 elements I expect to find the 3 genes from which this models were derived only: SREBF2, HMGCS1, and MVK

118 © 2007 Genomatix Software GmbH - Back to sequence level Scroll all the way down... This list is quite interesting! Here we have the differents TF sites in this set of frameworks. This list represents those TFs which we should concentrate on, when analyzing the regulation of the original input gene. It is pretty comparable to the list from our phylogenetic approach before. There is now good evidence that those factors play a role in regulation in the biological context of cholesterol metabolism. Now lets see how selective this model is...

119 © 2007 Genomatix Software GmbH - It is just one click away... Back to sequence level

120 © 2007 Genomatix Software GmbH - This should look familiar to you ! But now we are going for the database section... Back to sequence level Unless you have a good reason to do so, always go for the database of promoters of annotated genes. This allows for GO-group Z-scoring of the database hits later on...

121 © 2007 Genomatix Software GmbH - This is a termination parameter. If this number of hits is reached before the end of the database, the search is terminated Back to sequence level Careful! Some browsers crash with too many hits to display in HTML ! (>10.000) A database search usually takes several minutes. In order to avoid a server time-out go for the option. You´ll receive a mail with a direct link to your result file ( it will be kept in your "Results Directory", too)

122 © 2007 Genomatix Software GmbH - Eight matches! Back to sequence level Wow ! In four sequences. Each model matches exactly once per sequence......out of a total of different promoters The three genes of our "training set"...

123 © 2007 Genomatix Software GmbH - Back to sequence level...plus one additional "new" gene! This one was not in our input list and is identified only by common promoter organization!

124 © 2007 Genomatix Software GmbH - Back to sequence level Those four genes now are extremely likely to share common regulation in the given biological context! The TFs in the framework now are the top candidates for further inspection.

125 © 2007 Genomatix Software GmbH - Back to sequence level Those four genes now are extremely likely to share common regulation in the given biological context! The TFs in the framework now are the top candidates for further inspection. … STOP !! First I had too many matches in MatInspector, now there are too many slides !!

126 © 2007 Genomatix Software GmbH - New Knowledge I am terribly sorry for that! However, eukaryotic transcriptional regulation is pretty complex.. Our group of researchers works in this field since more than two decades. As you have seen, our tools - though pretty easy to use - require some explanations and sometimes a slightly different mind-setting, going beyond looking at single, isolated TF binding sites. I hope I was able to show you some basic strategies to follow. Nevertheless, lets have a final look at the additional gene which we have found with the database search in our example...

127 © 2007 Genomatix Software GmbH - New Knowledge

128 © 2007 Genomatix Software GmbH - New Knowledge

129 © 2007 Genomatix Software GmbH - New Knowledge MMAB is a transferase involved in vitamin B(12) activation and linked to a disease: methylmalonyl aciduria

130 © 2007 Genomatix Software GmbH - New Knowledge Feeding all 4 genes from ModelInspector into BiblioSpherePE shows that they are all connected plus...

131 © 2007 Genomatix Software GmbH - In our example, we started with a single gene ( HMGCS1), ElDorado put it into biological context in and concentrated on an potential regulator ( SREB), BiblioSpherePE identified common promoter organization (TF-Framework) GEMS Launcher, FrameWorker searched for additional genes with similar promoter organization and GEMS Launcher, ModelInspector put the genes back into biological context. BiblioSpherePE Literature confirmed that we indeed found a co-regulated network and identified the molecular basis for such. This could NEVER be achieved by statistical analysis of isolated TFBS

132 © 2007 Genomatix Software GmbH - There is so much more in GenomatixSuite PE I did neither say a word to matrix generation, nor to direct experimental planning for knock-out/knock-in experiments with SequenceShaper Expand the hit-list by shortening the framework, etc... etc... Get in touch with us via and we will give you a tour through the entire system at a web-meeting. Some informative links:


Download ppt "© 2007 Genomatix Software GmbH - Too."

Similar presentations


Ads by Google