Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia.

Slides:



Advertisements
Similar presentations
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Advertisements

Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Human Genome Project What did they do? Why did they do it? What will it mean for humankind? Animation OverviewAnimation Overview - Click.
Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.
Table 2 shows that the set TFsf-TGblbs of predicted regulatory links has better results than the other two sets, based on having a significantly higher.
A Novel Knowledge Based Method to Predicting Transcription Factor Targets
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Bioinformatics Lecture 2. Bioinformatics: is the computational branch of molecular biology Using the computer software to analyze biological data The.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
BACKGROUND E. coli is a free living, gram negative bacterium which colonizes the lower gut of animals. Since it is a model organism, a lot of experimental.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Finding Regulatory Motifs in DNA Sequences
EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Manifestations of a Code Genes, genomes, bioinformatics and cyberspace – and the promise they hold for biology education.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
I.U. School of Informatics Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
Detecting Protein Function and Protein-Protein Interactions from Genome Sequences TuyetLinh Nguyen.
Gene Expression Ilana Granovsky Jonathan Laserson.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
Yiming Kang, Hien-haw Liow, Ezekiel Maier, & Michael Brent
MCB 7200: Molecular Biology
The Transcriptional Landscape of the Mammalian Genome
Detection of genome regulation sequences
Basics of Comparative Genomics
PBIO 4500/5500: Biotechnology and Genetic Engineering
Functional Annotation of Transcripts
EL: To find out what a genome is and how gene expression is regulated
Genomes and Their Evolution
Alternative Computational Analysis Shows No Evidence for Nucleosome Enrichment at Repetitive Sequences in Mammalian Spermatozoa  Hélène Royo, Michael Beda.
1 Department of Engineering, 2 Department of Mathematics,
Genomes and Their Evolution
There are four levels of structure in proteins
1 Department of Engineering, 2 Department of Mathematics,
Cis-regulatory evolution of duplicate genes in yeasts
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Bioinformatics II
1 Department of Engineering, 2 Department of Mathematics,
Volume 38, Issue 4, Pages (May 2010)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
12-5 Gene Regulation.
Dr Tan Tin Wee Director Bioinformatics Centre
From Prescription to Transcription: Genome Sequence as Drug Target
Evolutionary Rewiring of Human Regulatory Networks by Waves of Genome Expansion  Davide Marnetto, Federica Mantica, Ivan Molineris, Elena Grassi, Igor.
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
One SNP at a Time: Moving beyond GWAS in Psoriasis
Nucleosomes Nucleosomes consist of DNA tightly wrapped around proteins called histones 75-90% of DNA is believed to be present in nucleosomes From faculty.
Volume 10, Issue 10, Pages (October 2017)
Unit Genomic sequencing
Basics of Comparative Genomics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Ariel Afek, Itamar Sela, Noa Musa-Lempel, David B. Lukatsky 
Computational genomics
Basic Local Alignment Search Tool
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia Evans, Virendra Bhavsar Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada E3B 5A3 Introduction Determination of regulatory networks from available data is one of the major challenges in bioinformatics research. A regulatory network of an organism is represented by a set of genes and their regulatory relationships, which indicate how a gene or a group of genes affect (inhibit or activate) production of other gene products as shown in Figure 1. Some organisms such as yeast, Arabidopsis thaliana (thale cress, a plant) and fruit fly are being investigated very thoroughly by biologists as model organisms. We are developing a system to predict the regulatory relationships of a non-model organism (target genome), about which less information is known, using information about the regulatory relationships of a related model organism (source genome). If the organisms are closely related then the regulatory relationships are likely to be similar. Differences in the regulatory relationships between organisms can be determined by using data from both the model and non-model organisms. This research started as a part of the bioinformatics research component of the Canadian Potato Genome Project. Analysis This methodology has been implemented for mapping regulatory elements and their regulatory network. The first step of mapping regulatory elements has been tested on Yeast (Saccharomyces cerevisiae) and Arabidopsis thaliana as the source and target genomes, respectively, which diverged approximately 1.6 Giga-years ago. For any pair of genomes, only some of the transcription factors from one genome can be mapped to another genome, since the evolutionary distance between them leads to many false negatives. In addition, the number of confirmed mappings between any two genomes is unknown as it depends on the definition of a confirmed mapping used in the experiment. The predicted transcription factors are compared on the basis of how likely a sequence predicted as a transcription factor is to be a transcription factor of the target genome how likely the predicted transcription factor is to correspond to the correct type of transcription factor from the source genome Therefore, the predicted transcription factors are compared to a set of 1922 available transcription factors of the Arabidopsis thaliana genome to determine the actual number of transcription factors predicted. Results | Inhibition Activation Figure 3: Number of hit sequences divided into four types (Confirmed, Similar, Other TF and Not TF) using TF-Seq for BLAST e-value cut-off parameter of 0.1 Figure 2: Number of true positives, false positives, false negatives and true negatives for transcription factors identified using TF-Seq, TF-Fam, and TF-SubFam Figure 1: Example of gene regulatory network Objectives Determine associations between the genes that act as regulatory elements (transcription factors and target genes) in model and non-model organisms Predict the regulatory relationships in a non-model organism Transcription factor mapping based on having the same protein domain family has better performance than the other two methods based on sequence similarity and having the same protein domain sub-family as shown in Figure 2. Also, the transcription factors predicted are of the correct type as illustrated in Figure 3 and the sequences with similar annotation may be part of the false positives. Figure 4 shows that target gene mapping by finding TFBS motifs in promoters has better performance than the other methods. The sequence similarity in BS-Blast is not useful for mapping target genes, showing that target genes with similar binding sites do not need to have high sequence similarity. Also, using BS-Nuc to refine the results of BS-Prom using the Nucleosomes Position Prediction tool does not improve the performance of the results, showing the effects of the variable position of the transcription-suppressing nucleosomes. Methodology Find transcription factors of the target genome using the available regulatory element information of the source organism based on Similar sequences (TF-Seq) Same protein domain family (TF-Fam) Same protein domain sub-family (TF-SubFam) Map target genes from the source genome to the target genome based on finding transcription factor binding site motifs (TFBS) in Nucleotide data of the target genome (BS-Seq) Promoter data of the target genome (BS-Prom) Similar target gene sequences of source genome in the target genome (BS-Blast) Nucleotide data of the target genome discarding binding sites located in the predicted regions of nucleosome occupancy (BS-Nuc) Figure 4: Number of true positives, false positives, false negatives and true negatives for target genes identified using BS-Seq, BS-Prom, BS-Blast and BS-Nuc Conclusion These results in this work show that TF-Fam and BS-Prom are promising methods for predicting regulatory elements for a non-model organism based on a model organism. These regulatory elements can be used further to predict the regulatory network of the non-model organism. Gene expression data will be used to further refine the regulatory network to understand how the predicted regulatory relationships correspond to the expression levels of the genes in the data