Summarized by Sun Kim SNU Biointelligence Lab.

Slides:



Advertisements
Similar presentations
Section 8.6: Gene Expression and Regulation
Advertisements

1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
Molecular genetics of gene expression Mat Halter and Neal Stewart 2014.
Regulation of eukaryotic gene sequence expression
Shine-Dalgarno Motif Ribosome binding site located about 13 bases upstream of AUG start codon SD sequence is: 5’-AGGAGGU-3’ Middle GGAG is more highly.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Activate Prior Knowledge
Chapter 10 genome, gene expression; genes as units of inheritance transmission of heritable characteristics; gene regulation, eukaryote chromosomes, alleles.
NAi_transcription_vo1-lg.mov.
Molecular Biology in a Nutshell (via UCSC Genome Browser) Personalized Medicine: Understanding Your Own Genome Fall 2014.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
A Biology Primer Part III: Transcription, Translation, and Regulation Vasileios Hatzivassiloglou University of Texas at Dallas.
Transcription … from DNA to RNA.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Introduction to Molecular Cell Biology Transcription Regulation Dr. Fridoon Jawad Ahmad HEC Foreign Professor King Edward Medical University Visiting Professor.
Eukaryotic Gene Structure. 2 Terminology Genome – entire genetic material of an individual Transcriptome – set of transcribed sequences Proteome – set.
Some basic molecular biology Summaries of: Replication, Transcription; Translation, Hybridization, PCR Material adapted from Lodish et al, Molecular Cell.
Molecular Basis for Relationship between Genotype and Phenotype DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription.
GENE REGULATION RESULTS IN DIFFERENTIAL GENE EXPRESSION, LEADING TO CELL SPECIALIZATION Eukaryotic DNA.
The Central Dogma of Molecular Biology replication transcription translation.
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Eukaryotic Gene Regulation
KEY CONCEPT 8.5 Translation converts an mRNA message into a polypeptide, or protein.
Regulation of Gene Expression
Eukaryotic Gene Structure
Fig Prokaryotes and Eukaryotes
What is a Hidden Markov Model?
Results for all features Results for the reduced set of features
Lesson Overview 13.1 RNA.
Lesson Overview 13.1 RNA.
Evaluating classifiers for disease gene discovery
Lesson Overview 13.1 RNA.
Eukaryotic Gene Finding
Chapter 12.5 Gene Regulation.
Recitation 7 2/4/09 PSSMs+Gene finding
1 Department of Engineering, 2 Department of Mathematics,
Concept 18.2: Eukaryotic gene expression can be regulated at any stage
Copyright Pearson Prentice Hall
Transcription.
1 Department of Engineering, 2 Department of Mathematics,
Lecture 4 By Ms. Shumaila Azam
Transcription in Prokaryotic (Bacteria)
Synthetic Biology: Protein Synthesis
Relationship between Genotype and Phenotype
Transcription Definition
1 Department of Engineering, 2 Department of Mathematics,
Relationship between Genotype and Phenotype
Lesson Overview 13.1 RNA Objectives: Contrast RNA and DNA.
Agenda 3/16 Eukaryotic Control Introduction and Reading
TRANSCRIPTION--- SYNTHESIS OF RNA
General Animal Biology
Lesson Overview 13.1 RNA.
Unit 7: Molecular Genetics
Copyright Pearson Prentice Hall
The Structure of the Genome
Lesson Overview 13.1 RNA.
Copyright Pearson Prentice Hall
Copyright Pearson Prentice Hall
Lesson Overview 13.1 RNA.
Lesson Overview 13.1 RNA.
Lesson Overview 13.1 RNA.
Gene Structure.
Eukaryotic Gene Regulation
Copyright Pearson Prentice Hall
Nat. Rev. Neurol. doi: /nrneurol
Relationship between Genotype and Phenotype
Gene Structure.
Presentation transcript:

Summarized by Sun Kim SNU Biointelligence Lab. Highly Specific Localization of Promoter Regions in Large Genomic Sequences by PromoterInspector: A Novel Context Analysis Approach Matthias Scherf et al. Journal of Molecular Biology, 2000 Summarized by Sun Kim SNU Biointelligence Lab.

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Promoter DNA sequences near the beginning of genes. Function To mediate and control initiation of transcription of that part of a gene that is located immediately downstream of the promoter (3’). (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ (cont’d) The DNA region required to fulfill the function can be determined by assays for promoter function in a heterologous context. Often complex regulation involves many more features than just the promoter. e.g) enhancers, locus control regions, etc. If any of these units, which are functionally completely different from promoters, happens to be located adjacent to the promoter, delineation of the promoter becomes difficult. One of the reasons why promoter prediction programs almost exclusively focus on proximal promoter regions or even just on the core promoter. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Transcriptional promoters Transcription can proceed only after a competent transcription complex consisting of RNA polymerase II and several general transcription factors have been recruited to the promoter (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Transcription (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Schematic structure of polymerase II promoter (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Assembly of the activator/promoter complex on the proximal and core promoter region (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ (cont’d) (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Aim of Promoter Recognition Location of an important part of the regulatory region of a gene. Promoter prediction can be useful in the context of gene prediction. The promoter marks he beginning of the first exon of a gene. Promoter region contain information complementary to the exons and introns because transcriptional regulation cannot be deduced from the predicted amino acid sequence. Transcriptional regulation can play an important part in gene function Promoter may yield first clues towards the function of a completely anonymous protein. Prediction of the functionality of a promoter would be welcome for gene therapy approaches to improve expression of newly created vector constructs (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ Introduction PromoterInspector Locate eukaryotic polymerase II promotor regions in large genomic sequences with a high degree of specificity. Focuses on the genetic context of promoters Based on libraries of IUPAC words extracted from training sequences by an unsupervised learning approach. Polymerase II promoters Do not contain any sequence elements that are consistently shared. Usually consist of multiple binding sites for transcription factors that must occur in a specific context, apparently shared only by a small group of promoters. Combination and orientation of the transcription factors is the crucial information. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Promoter prediction approaches Heuristic approaches Approaches that attempt to recognize core promoter elements such as TATA boxes, CAAT boxes, and INR (transcription initiation sites) Approaches that attempt to use the whole ensemble of elements (transcription factor binding sites, oligonucleotides), found in a promoter (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ (cont’d) Heuristic approaches Use models that describe the orientation and context of several transcription factor binding sites Have been proven to be able to detect promoters with a very high level of specificity But with limited coverage Useful to predict specific promoter classes (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ (cont’d) Approaches that attempt to recognize promoter elements Mostly predict of the order of one promoter per kilobase in human DNA (Fickett & Hatzigeorgiou, 1997). The average distance between functional promoters has been estimated to be in the range of 30 to 40 kb, with a very uneven distribution. Most of predicted promoters are false positives. Some of the tools use a more restrictive approach to reduce the number of total predictions, but still the problem remains. False positive matches preclude experimental verification. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Design of the prediction system Polymerase II promoters are quite different in terms of individual organization, but are probably embedded into a common genomic context. Specific features of such a putative context are not yet known. Based on context features extracted from training sequences by an unsupervised learning technique. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Definition of context features Based on an approach using oligonucleotides with one variable mismatch (Wolfertstetter et al., 1996). Extended the approach by the introduction of wildcards at multiple positions. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ (cont’d) Context features are defined by by disjunct groups of similar IUPAC words (IUPAC groups). Each IUPAC group is uniquely defined by a set of oligonucleotides and a number of undefined base-pairs (wildcards, “N”). The IUPAC words of a IUPAC group contain all elements of the oligonucleotide set in the same order and orientation, and differ in the number of wildcards between them. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ (cont’d) Wildcards at the beginning and end of IUPAC words are discarded. Example A IUPAC group which results from two wildcards and the oligonucleotide set (AGC, GCA)  (AGCGCA, AGCNGCA, AGCNNGCA) (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Definition of decision instances Prediction of the genomic promoter context is based on several decision instances (classifiers). A classifier is defined by two disjunct sets of IUPAC groups: Promoter-related IUPAC groups  “promoter” Non-promoter-related IUPAC groups  “non-promoter” The classification is based on IUPAC group matches. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ (cont’d) The IUPAC group candidates of a classifier are directly extracted from a set of training sequences. All IUPAC groups that match at least once in these sequences are involved. Example If IUPAC groups are defined by a set of two oligonucleotides of length two and two wildcards, From the training sequence AGCTG Candidates (AGCT, AGNCT, AGNNCT), (AGTG, AGNTG, AGNNTG), (GCTG, GCNTG, GCNNTG) (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Determining candidates Given a set of promoter and non-promoter sequences (training sequences) If the ratio between the number of hits in the promoter and non-promoter training sequences exceeds a certain threshold (“assignment threshold”).  a candidate is assigned to the class promoter. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Architecture of the prediction system PromoterInspector is based on three classifiers Each classifier is specialized to differentiate between promoter and one of non-promoter sequences sets: exon, intron and 3’-UTR. Assigns a sequence to the class promoter only if all three classifiers agree. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Parameter optimization of the prediction system Parameters The number of wildcards The number and length of the elements in the oligonucleotide sets which define the IUPAC groups (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ (cont’d) Optimization Three crossvalidation sets are prepared From a given set, 90% for parameter optimization, 10% for evaluation A set of different parameter constellations was generated A classifier was built for every parameter constellation and the best classifier was kept. The classifiers which resulted from step 3 were evaluated on the evaluation set  optimal assignment threshold is 1. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Results of the three classfiers (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Application technique Identification of promoter regions in large genomic sequences is performed by a sliding window approach. A window is moved over the sequence and its content is classified. A promoter region is reported if a certain number of consecutive windows are identified as members of the promoter class.  need parameter optimization: the length of the window, the offset between two consecutive windows, the number of consecutive hits  window size: 100, offset: 4, number of consecutive hits: 24 (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Fickett’s evaluation data set Consists of 24 promoters covering a total of 33,120 bp. (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Results for the Fickett data set (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Description of the large genomic sequences (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Summary of large genomic sequence analysis (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/

(C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/ http://genomatix.gsf.de/cgi-bin/promoterinspector/promoterinspector.pl (C) 2001, SNU Biointelligence Lab, http://bi.snu.ac.kr/