EXTENDING GENE ANNOTATION WITH GENE EXPRESSION

Slides:



Advertisements
Similar presentations
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Advertisements

CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
20,000 GENES IN HUMAN GENOME; WHAT WOULD HAPPEN IF ALL THESE GENES WERE EXPRESSED IN EVERY CELL IN YOUR BODY? WHAT WOULD HAPPEN IF THEY WERE EXPRESSED.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan
Probability: Many Random Variables (Part 2) Mike Wasikowski June 12, 2008.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Introduction to BioInformatics GCB/CIS535
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY 1 Identifying Regulatory Transcriptional Elements on Functional Gene Groups Using Computer-
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Fuzzy K means.
Control of Gene Expression Big Idea 3: Living systems store, retrieve, transmit, and respond to info essential to life processes.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
Chapter 11 Table of Contents Section 1 Control of Gene Expression
CDNA Microarrays MB206.
Verna Vu & Timothy Abreo
Eukaryotic Genomes 15 November, 2002 Text Chapter 19.
The Lac Operon An operon is a length of DNA, made up of structural genes and control sites. The structural genes code for proteins, such as enzymes.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Introduction to Gene Expression
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
A B IL-4(+) IL-4(-) IL-4(+) IL-4(-) ChIP-Seq (STAT6) Ramos IL-4 (+) P-value Ramos IL-4 (-) P-value BEAS2B IL-4 (+) P-value BEASB IL-4 (-) P-value fold.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
EB3233 Bioinformatics Introduction to Bioinformatics.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Finding genes in the genome
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Generation of patterns from gene expression by assigning confidence to differentially expressed genes Elisabetta Manduchi, Gregory R. Grant, Steven E.McKenzie,
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
The Transcriptional Landscape of the Mammalian Genome
Inferring Models of cis-Regulatory Modules using Information Theory
Algorithms for Regulatory Motif Discovery
Control of Gene Expression
Dynamic epigenetic enhancer signatures reveal key transcription factors associated with monocytic differentiation states by Thu-Hang Pham, Christopher.
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
ppGpp Controls Global Gene Expression in Light and in Darkness in S
Identification and Characterization of pre-miRNA Candidates in the C
Protein Occupancy Landscape of a Bacterial Genome
Unit III Information Essential to Life Processes
RAD (RNA Abundance Database)
Functional Genomics Consortium: NIDDK (Kaestner) and (Permutt)
Volume 130, Issue 4, Pages (April 2006)
Mapping Global Histone Acetylation Patterns to Gene Expression
Joseph Rodriguez, Jerome S. Menet, Michael Rosbash  Molecular Cell 
Volume 14, Issue 7, Pages (February 2016)
Volume 67, Issue 6, Pages e6 (September 2017)
Jong-Eun Park, Hyerim Yi, Yoosik Kim, Hyeshik Chang, V. Narry Kim 
Volume 122, Issue 6, Pages (September 2005)
Volume 16, Issue 2, Pages (February 2015)
Supporting High-Performance Data Processing on Flat-Files
Manfred Schmid, Agnieszka Tudek, Torben Heick Jensen  Cell Reports 
Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick
Presentation transcript:

EXTENDING GENE ANNOTATION WITH GENE EXPRESSION Chris Stoeckert, Penn Center for Bioinformatics

Extending Gene Annotation with Gene Expression Patterns Genomic Sequence ESTs Clones, tags, cDNA, oligos BLAST arrays, SAGE, differential display Genes What is it? What does it do? Sequence similarity How does it do it? Where/When does it do it? Gene expression What happens when it doesn’t do it right? } }

Data Flow for Annotation by Gene Expression ESTs Clones, tags, cDNA, oligos arrays, SAGE, differential display BLAST Database of Transcribed Sequences RNA Abundance Database Pattern Generator and Analysis What is the gene? Where is it expressed? Look for co-regulation. Look for networks. Sequence Analysis

RAD Schema Enhances Annotation by Providing Better Understanding of Data Data verification consistency within an experiment Reproducibility consistency between experiments Comparison between platforms can track data for same mRNA Integration with other resources DOTS (gene info), Anatomy (sample info)

RAD Schema: Three sets of Tables Provide Flexibility Experiments Data Platforms Anat_rel Others... SpotResult ExperimentCondition StanfordSpotFamily SpotFamilyResult Experiment GenomeSystems SpotFamily ExperimentyResult SynteniSpotFamily

RAD:DOTS Interface RNA Abundance Database Database of Transcribed Sequences Anatomy Cellular role Clones/PCR filter/array Experiment probe other parameters hyb/wash conditions Results signal/ % background adjustments SWISS-PROT neighbors protein EST clusters RNA Genomic sequence DNA Regulatory elements

RAD Web Interface choose through DOTS

RAD Web Interface example of retrieving by cell role or library

Expression Pattern Algorithm Input: files with identifier and value (e.g., IMAGE clone ID, percent of total signal) for each experiment. tolerated variance between replicates. Dynamic range (e.g., top 15% of signals are meaningful) dependencies between experiments Output: expression patterns based on ratios between experiments (list of bins) variance between replicates genes above background distribution of ratios for use in statistical analysis

Expression Pattern Algorithm 1. Determine minimum useful value for each group of replicate experiments based on specified dynamic range. Raise all values below the minimum useful value to equal this value. 2. Determine the ratio (cutratio) that contains specified percentage of ratios between replicates. Default = 2. 3. Take ratios between average values for each group of replicates. Use the median value if group or use specified reference group in denominator. 4. Bin the ratios. Use powers of the cutratio and the range of ratios to generate the cut-off ratios for each bin. Generate a second set of bins offset from the first to capture ratios which straddle the first set of bins.

Statistical Analysis of Patterns Models the ratios as a multinomial experiment Null hypothesis is independence between genes, given a model describing dependencies between experiments. (independent, reference, conditional, 1st order Markov) The number of expected patterns are calculated based on the distribution of ratios and the experiment model. The likelihood of the observed number of patterns can be calculated (or simulated) using the number of expected patterns. Simulators throw weighted die to generate patterns for each gene to obtain the number of genes in a specified pattern. A score is generated using this expected number and the actual numbers of genes in a pattern.

Sample output of pattern program

Extended Annotation from Expression Patterns RAD: Pattern: DOTS: GenBank: (GAIA) TESS: Extract EST data in experiment groups. Find ESTs with same pattern. Map ESTs to transcribed sequences. Get promoters. (genomic sequence upstream of transcribed sequences) Look for shared transciption factor binding sites.

Extended Annotation from Expression Patterns Comparison of array data stored in RAD from: HEL, HEL+hemin, CD34, erythroblasts Significant cluster of clones down-regulated in erythroblasts: 10/24 clones coded for ribosomal proteins according to DOTS. Obtained promoters for 4 of these ribosomal proteins from GenBank. Used TESS to find shared transcription factor binding sites. All contain sites for PU.1 which is antagonistic to red cell growth. Result is subset of ribosomal proteins which are co-regulated and potential mechanism for co-regulation.

Summary genomic sequence ESTs candidate genes DOTS: integrated info RAD: facilitate analysis Pattern: co-regulation candidate function candidate genes with related role

Acknowledgements