5’3’ 5’ Head to Head GENE 2 GENE 1 For a promoter to be called BIDIRECTIONAL it should satisfy two conditions  1.Adjacent genes should be in the Head to Head orientation 2.Their transcription start sites should be not more than 1000bp apart 5’3’ 5’ 3’ 5’ 3’ 5’ Head to Head Head to Tail Tail to Tail
Promoters of many co-expressed Bidirectional Gene Pairs are capable of initiating transcription in both directions. [Human Genome ] Trinklein et al (2004)  Compared to Tail to Tail, Head to Head gene arrangement is more conserved. [Vertebrates]. Yang et al (2008)  Co-expression of adjacent gene pairs [Yeast]. S. Kruglyak and H. Tang (2000)  Orientation affects co-expression of neighboring genes. [Arabidopsis thaliana] Williams et al (2004) 
Search and Analysis possible bidirectional promoters in Arabidopsis Thaliana Arabidopsis thaliana (Wall cress/Mouse-ear cress) Model Organism for plants Herbaceous dicot (Brassicaciae family) Plants of economic importance – Cabbage, Broccoli, Turnips, Mustard, Rapeseed Arabidopsis thaliana (Wall cress/Mouse-ear cress) Model Organism for plants Herbaceous dicot (Brassicaciae family) Plants of economic importance – Cabbage, Broccoli, Turnips, Mustard, Rapeseed
Mining adjacent Gene pairs Microarray data could suggest co-regulation If the gene pairs are co-expressed 1. What is the most prevalent intergenic distance? 2. Any common Motifs ? 3. Identification of Transcription factor. 4. Head to Head vs the rest. 5. Distance conservation and orientation patterns in Brassica rapa
Head to Head Head to Tail Tail to Tail All Pairs within 500bp Pseudogenes, Transposons, RNA’s Duplicates bl2seq – evalue cutoff 1e-5 Remove Dataset - Gene annotation in Gff format from The Arabidopsis Information Resource ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR8_genome_release ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR8_genome_release Head To HeadHead to TailTail to Tail 136938072674
Dataset - Pre-processed expression data for 22810 probe sets on the Affymetrix Arabidopsis ATH1 (25K) array across 1436 hybridization experiments. ftp://ftp.arabidopsis.org/home/tair/Microarrays/analyzed_data/affy_data_1436_10132005.zip Start with 1436 Affymetrix Arabidopsis 25K arrays obtained from NASCArrays and AtGenExpress. Normalize the data using the robust multi-array average (RMA) method. Match probes to the gene pairs obtained For each pair calculate the correlation coefficient Based on appropriate cut-off for correlation coefficient select Highly Co- expressed gene pairs. Plot % gene pairs against its correlation coefficient
H_H[H_T]+[T_T] Fishers Exact P-Value >=60% 551226.20E-06 <=60% 7873887 H_T[H_H]+[T_T] Fishers Exact P-Value >=60% 79980.2836 <=60% 22882386 T_T[H_H]+[H_T] Fishers Exact P-Value >=60% 431340.005855 <=60% 15993075 We want to test if the Highly Co-expressed genes significantly correlated to the H_H (potentially containing a bi-directional promoter). The test is used to examine the significance of the association between two variables in a 2 x 2 contingency table. Here the Sample is divided into H_H and non H_H (the 1 st variable) vs. Highly Co- expressed gene pairs and the remaining gene pairs (the 2 nd variable).
12345678910 >=60 <=60 12345678910 >=60 <=60 12345678910 >=60 <=60 If the intergenic distance distribution in Highly Co-expressed gene pairs vary significantl y from gene pairs having Low Co-expression Leave one out technique was used to see which one of the distance categories contributed more.
E= 8.6e-037 E= 3.6e-007 E= 1.9e-002 Intergenic regions of highly co-expressed pairs in Head to Head was provided to MEME with the following parameters Any number of repetitions of the motif was allowed E-value cutoff 0.1 Intergenic regions of highly co-expressed pairs in Head to Head was provided to MEME with the following parameters Any number of repetitions of the motif was allowed E-value cutoff 0.1
Ascorbate oxidase gene (AO) promoter; Found in silencer region; AOBP (AGTA repeat binding protein) has DOF domain required for repression of expression of AO gene. Light responsive element (LRE) found in the parsley (P.c.) CHS-1 (chalcone synthase-1) gene promoter. Regulate gene expression during initiation of axillary bud outgrowth in Arabidopsis
No: pairs with Motif 1 Pairs Without Motif 1 Total Number Head to Head 203555 Head to Tail 126779 Tail to Tail 34043 H_H[H_T]+[T_T] Fishers Exact P-value #enriched 2015 0.0004077 #not enriched 35107 H_T[H_H]+[T_T] Fishers Exact P-value #enriched 1223 0.1883 #not enriched 6775 T_T[H_T]+[H_H] Fishers Exact P-value #enriched 332 0.0277 #not enriched 40102 Position Specific Probability Matrix from MEME was provided to TESS along with intergenic regions of highly correlating gene pairs in all orientations.
AT1G09760-AT1G09770 [protein binding, response to cold]-[DNA binding, transcription factor activity, regulation of transcription-defense response signaling pathway] AT5G23080-AT5G23090 [RNA binding, RNA processing ]-[intracellular, transcription factor activity, regulation of transcription ] AT5G64670-AT5G64680 [ribosome, structural constituent of ribosome, translation, ribosome biogenesis and assembly ]-[ribosome, structural constituent of ribosome, translation, ribosome biogenesis and assembly ] AT2G40650-AT2G40660 [ binding, RNA processing]-[tRNA binding, tRNA aminoacylation for protein translation] AT3G46030-AT3G46040 [nucleus, DNA binding, nucleosome assembly, nucleosome ]-[structural constituent of ribosome, translation, cytosolic small ribosomal subunit ] AT1G23280-AT1G23290 [MAK16 protein-related]-[Encodes a ribosomal protein L27A, a constituent of the large subunit of the ribosomal complex] AT1G76400-AT1G76405 [endoplasmic reticulum, oligosaccharyl transferase activity, protein amino acid glycosylation ]-[similar to chloroplast channel forming outer membrane protein [Pisum sativum] (GB:CAB58442.1)] AT5G05670-AT5G05680 [endoplasmic reticulum, signal recognition particle binding ]-[nuclear pore complex protein-related;] AT2G20480-AT2G20490 [ similar to Os09g0446000 [Oryza sativa (japonica cultivar-group)] (GB:NP_001063306.1)]-[Cajal body, nucleolus, RNA binding, polar nucleus fusion ] AT3G56990-AT3G57000[EDA7 (embryo sac development arrest 7)]-[nucleolar essential protein-related]
AT4G18360-AT4G18370 [glycolate oxidase activity, electron transport, metabolic process ]-[chloroplast thylakoid lumen, serine-type peptidase activity, trypsin activity, proteolysis, photosystem II repair ] AT4G33500-AT4G33510 [protein serine/threonine phosphatase activity]-[chloroplast, 3-deoxy-7-phosphoheptulonate synthase activity, aromatic amino acid family biosynthetic process, chorismate biosynthetic process ] AT4G35440-AT4G35450 [membrane, voltage-gated chloride channel activity, chloride transport]-[protein folding, defense response to bacterium, incompatible interaction, protein targeting to chloroplast, integral to chloroplast outer membrane ] AT2G35490-AT2G35500 [chloroplast thylakoid membrane, structural molecule activity]-[shikimate kinase-related] AT2G37310-AT2G37320 [pentatricopeptide (PPR) repeat-containing protein]-[pentatricopeptide (PPR) repeat-containing protein] AT1G13030-AT1G13040 [unknown,sphere organelles protein-related; similar to hypothetical protein [Brassica rapa] (GB:ABQ50545.1); contains domain PTHR15197 (PTHR15197)]-[pentatricopeptide (PPR) repeat- containing protein] AT1G14270-AT1G14280 [prenyl-dependent CAAX protease activity ]-[Encodes phytochrome kinase substrate 2. PKS proteins are critical for hypocotyl phototropism. ] AT3G16990-AT3G17000 [TENA/THI-4 family protein; Identical to Seed maturation protein ]-[ubiquitin-protein ligase activity] AT1G04070-AT1G04080[P-P-bond-hydrolysis-driven protein transmembrane transporter activity, protein targeting to mitochondrion ]-[regulation of timing of transition from vegetative to reproductive phase, ] AT1G27390-AT1G27400 [ P-P-bond-hydrolysis-driven protein transmembrane transporter activity, protein targeting to mitochondrion ]-[ribosome, structural constituent of ribosome, translation ]
Dataset - Finished BAC’s of Brassica rapa in FASTA format ftp://188.8.131.52/pub/brassica/KBr_finished.fasta ftp://184.108.40.206/pub/brassica/KBr_finished.fasta Protein sequences of the Highly Correlating Genes Arabidopsis ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR8_blasts ets/TAIR8_pep_20080412 Blastall - [Program – tblastn] [Database -Brassica BAC’s] [Query - Arabidopsis Protein Sequences] [E-value cutoff - 1e-20]
Head to Head Head to Tail Tail to Tail Different BAC’s Total Head to Head 16101330 Head to Tail 01902342 Tail to Tail0061925
Percentage of highly correlating pairs more in Head to head Highly co-relating pairs in Head to Head fall within 100-150bp Head to Head pairs mostly RNA/DNA/Protein binding UP1ATMSD motif enriched in Head to Head Orientation seems to be conserved in B.rapa but intergenic distance seems to have lower conservation
1. Adachi N, Lieber MR: Bidirectional gene organization: a common architectural feature of the human genome. Cell 2002, 109(7):807-809 2. Trinklein, Nathan D., Aldred, Shelley Force, Hartman, Sara J., Schroeder, Diane I., Otillar, Robert P., Myers, Richard M. An Abundance of Bidirectional Promoters in the Human Genome Genome Res. 2004 14: 62-66 3. Yang MQ, Taylor J, Elnitski L: Comparative analyses of bidirectional promoters in vertebrates. BMC Bioinformatics 2008, 9 Suppl 6:S9 4. Kruglyak, Semyon., Tang, Haixu. Regulation of adjacent yeast genes. Trends in Genetics 2000, 16 (3):109-111. 5. Williams, Elizabeth J.B., Bowles, Dianna J.Coexpression of Neighboring Genes in the Genome of Arabidopsis thaliana. Genome Res. 2004 14: 1060-1067