Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S051044.

Slides:



Advertisements
Similar presentations
MicroARNs, comparaison, prediction
Advertisements

ECG Signal processing (2)
An Introduction of Support Vector Machine
Optimization of SVM Parameters for Promoter Recognition in DNA Sequences Robertas Damaševičius Software Engineering Department, Kaunas University of Technology.
Naveen K. Bansal and Prachi Pradeep Dept. of Math., Stat., and Comp. Sci. Marquette University Milwaukee, WI (USA)
MiRNA in computational biology 1 The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig C. Mello for their discovery of "RNA interference.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Computational biology seminar
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Presenting: Asher Malka Supervisor: Prof. Hermona Soreq.
MicroRNA genes Ka-Lok Ng Department of Bioinformatics Asia University.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
Lecture 12 Splicing and gene prediction in eukaryotes
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Supervisor: Yihong Jennifer Tan Eric Gähwiler Karim Hamidi
Protein Tertiary Structure Prediction
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
Efficient Model Selection for Support Vector Machines
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
Semantic Similarity over Gene Ontology for Multi-label Protein Subcellular Localization Shibiao WAN and Man-Wai MAK The Hong Kong Polytechnic University.
Identifying and classifying functional small RNAs from pine Ryan Morin BC Genome Sciences Centre (presenting research conducted in the lab of Dr. Peter.
Presented by Tienwei Tsai July, 2005
From Genomic Sequence Data to Genotype: A Proposed Machine Learning Approach for Genotyping Hepatitis C Virus Genaro Hernandez Jr CMSC 601 Spring 2011.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
1 A Feature Selection and Evaluation Scheme for Computer Virus Detection Olivier Henchiri and Nathalie Japkowicz School of Information Technology and Engineering.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
Computational Identification of Drosophila microRNA Genes Journal Club 09/05/03 Jared Bischof.
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
© Wiley Publishing All Rights Reserved. RNA Analysis.
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
RNA Structure Prediction
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Spam Detection Ethan Grefe December 13, 2013.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
MicroRNAs and Other Tiny Endogenous RNAs in C. elegans Annie Chiang JClub Ambros et al. Curr Biol 13:
Questions?. Novel ncRNAs are abundant: Ex: miRNAs miRNAs were the second major story in 2001 (after the genome). Subsequently, many other non-coding genes.
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
Mark D. Adams Dept. of Genetics 9/10/04
Improving Intergenic miRNA Target Genes Prediction Rikky Wenang Purbojati.
Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,
Computational prediction of miRNA and miRNA-disease relationship
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Motif Search and RNA Structure Prediction Lesson 9.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
杜嘉晨 PlantMiRNAPred: efficient classification of real and pseudo plant pre-miRNAs.
Abstract Premise Figure 1: Flowchart pri-miRNAs were collected from miRBase 10.0 pri-miRNAs were compared to hsa and ptr genomes using BlastN and potential.
RNA Structure Prediction
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
bacteria and eukaryotes
How to forecast solar flares?
Predicting RNA Structure and Function
Support Vector Machine (SVM)
Introduction to Bioinformatics II
Identification and Characterization of pre-miRNA Candidates in the C
Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets  Benjamin P. Lewis, Christopher B. Burge,
Manisha Panta, Avdesh Mishra, Md Tamjidul Hoque, Joel Atallah
Presentation transcript:

Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S051044

 Background  Results and Discussion  Conclusion  Methods Outline

Background  MicroRNAs (miRNA) are non-coding RNAs about 21–26 nucleotide (nt) in length that play important regulatory roles.  MiRNA precursors (pre-miRNAs) are characterized by their hairpin structures.  During the biogenesis procedure of a mature miRNA, the hairpin structure of premiRNA acts as not only the structure motif for Exportin-5,in nuclear-cytoplasm transportation, but also a substrate for Dicer enzyme.

Background  As a characteristic secondary structure, the hairpin of pre-miRNA is an important feature used in the computational identification of miRNAs.  In this study, we focus on the ab initio classification of real pre- miRNA from other hairpin sequences with similar stem-loop features (we call them as pseudo pre-miRNAs).A set of novel features that combines the local continuous structure and sequence information of the stem-loops are proposed.  SVM is used to classify two classes based on the features. SVM has been widely applied to the prediction and classification of important biology signals such as promoters, translation initiation sites, splicing sites and proteins.

Results and Discussion  Human miRNA precursors The sequences of human pre-miRNAs are downloaded from the miRNA registry database,which contains 207 reported pre-miRNA entries from Homo sapiens. Only the pre-miRNAs whose secondary structures do not contain multiple loops are considered, which gives us 193 pre-miRNAs, covering more than 93% of all the reported human pre-miRNAs.  Pseudo and candidate miRNA hairpins Two datasets of pre-miRNA-like hairpins are built. They are sequence segments that have similar stem-loop structures as genuine pre-miRNAs but have not been reported as pre-miRNAs. According the way to collect them, we call them as the "CODING" and the "CONSERVED-HAIRPIN" datasets.

Results and Discussion criteria  The secondary structures of the extracted segments are predicted using RNAfold. The criteria for selecting the pseudo-miRNAs from the segments are: minimum of 18 base pairings on the stem of the hairpin structure (included the GU wobble pairs), maximum of -15 kcal/ mol free energy of the secondary structure, and no multiple loops. These criteria ensure that the extracted pseudo pre-miRNAs are similar to real pre-miRNAs according to the widely accepted characteristics.  The CODING dataset is collected from the protein coding regions.Totally, 8,494 pre-miRNA-like hairpins are collected in this dataset.  The CONSERVED-HAIRPIN dataset is extracted from the genome region of position 56,000,001 to 57,000,000 on human chromosome 19. 2,444 hairpins compose the CONSERVED-HAIRPIN dataset.It should be noted that, unlike the CODING set, there might be a few true miRNAs among these segments (hsa-mir- 99b, hsa-let-7e and hsa-mir-125a).

Results and Discussion  The CODING dataset is used as negative samples in the training and validation of the SVM classifier, and the CONSERVED-HAIRPIN dataset is used as a candidate dataset to evaluate how the classifier works on the genome.

Results and Discussion ---- Training and test sets for classification experiments One training set TR-C: 163 human pre-miRNAs (positive samples) (randomly selected from the 193 human premiRNAs) 168 pseudo pre-miRNAs (negative samples) (randomly selected from the CODING dataset) Two test setsTE-C: the remaining 30 human pre-miRNAs not used in TR-C pseudo pre-miRNAs randomly picked up from the CODING dataset. The CONSERVED-HAIRPIN dataset is the second test set.

Results and Discussion  CROSS-SPECIES test set Through caculting similarity by BLASTCLUST, 581 pre-miRNAs from the 11 species remained for the test experiment. We refer this set of pre-miRNAs as the CROSS-SPECIES test set.  Latest human miRNA updated set Through a series of processing,the remaining 39 pre-miRNAs are used as the UPDATED test set.

Results and Discussion ----The local contiguous structure-sequence features  The RNA secondary structures are predicted using RNAfold. In the predicted secondary structure, there are only two statuses for each nucleotide, paired or unpaired,indicated by brackets ("("or")") and dots ("."), respectively.  For any 3 adjacent nucleotides, there are 8 (23) possible structure compositions: "(((", "((.", "(..", "(.(", ".((", ".(.", "..(" and "...". Considering the middle nucleotide among the 3, there are 32 (4 × 8) possible structure-sequence combinations, which we denote as "U(((", "A((.", etc. This defines our triplet structure-sequence elements.

Results and Discussion ----The local contiguous structure-sequence features Using the triplet elements to represent the local structure sequence features of the hairpin.

Results and Discussion ----The local contiguous structure-sequence features  The average appearance frequencies of the triplet elements in the two classes (real pre-miRNA vs. pseudo-miRNA hairpins).

Results and Discussion  Classification performance of the triplet-SVM classifier on test sets TE-C, CONSERVED-HAIRPIN and UPDATED.

Results and Discussion  The discriminative power of top 15 triplet elements. The discriminative power of the triplet element features that distinguish pre-miRNAs from other similar hairpins are calculated using the F value and the 15 most discriminative triplet elements are listedhere. The μ+, μ- and σ+, σ- are the means and standard deviations of the elements in the two classes estimated with the training dataset.

Results and Discussion  SVM performs better when taking the sequence information into the triplet elements, than using just the 8 triplet structure features without the nucleotide information. 1. We trained another SVM classifier with same training dataset using only the 8 triplet structure features. When being applied to the set TE-C, the SVM classifier correctly recognized 29 out of 30 human premiRNAs(sensitivity 96.7%), but it only detected 636pseudo-miRNAs as negative (specificity 63.6%). 2. On the CONSERVED-HAIRPIN set, that SVM classifier identified 1702 out of the 2444 potential hairpin structures as false pre-miRNAs, which gave specificity of or above 69.6%. We can see that the specificity can be greatly improved when using both structure and sequence information.

Results and Discussion  Prediction accuracy on test set CROSS-SPECIES by SVM trained with human data.  Testing on the latest human-specific miRNAs.

Conclusion  Ab initio method for distinguishing true pre-miRNAs from other pre-miRNA-like hairpin structures is important for discovering new and species-specific miRNAs. For this purpose, a set of novel features (the triplet elements) to describe local contiguous structure-sequence characteristics are extracted, and support vector machine is applied with these features to classify real vs. pseudo pre-miRNAs achieving about 90% accuracy on human test data.  Remarkably, the triplet-SVM classifier built on humandata can correctly classify up to 90% of the pre-miRNAs from the other 11 species including plants and virus without utilizing any comparative genomics information andan accuracy of 92.3% is achieved on the newly reportednovel human miRNAs.  The local structure-sequence features may contain distinctive and conserved characteristics of miRNAs, and the successful ab initio classification of real and pseudo pre-miRNAs opens a new approach for discovering new miRNAs.

Methods---- SVM  For a given data set (i = 1,... N) with corresponding labels yi (yi = +1 or -1, representing the two classes to be classified, as real premiRNA vs. pseudo pre- miRNA in this study), SVM gives a decision function (classifier): are the coefficients to be learned and K is a kernel function. Parameters are trained through maximizing The LibSVM package is used. To obtain SVM classifier with optimal performance, the penalty parameter C and the RBF kernel parameter γ are tuned based on the training set using the grid search strategy in LibSVM.

That's all,Thank you! That's all,Thank you!