Identification and Characterization of pre-miRNA Candidates in the C

Slides:



Advertisements
Similar presentations
Improving miRNA Target Genes Prediction Rikky Wenang Purbojati.
Advertisements

The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
MiRNA in computational biology 1 The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig C. Mello for their discovery of "RNA interference.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Computational biology seminar
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Presenting: Asher Malka Supervisor: Prof. Hermona Soreq.
LECTURE 5: DNA, RNA & PROTEINS
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
EXPLORING DEAD GENES Adrienne Manuel I400. What are they? Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
Organizing information in the post-genomic era The rise of bioinformatics.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
© Wiley Publishing All Rights Reserved. RNA Analysis.
Sackler Medical School
Lettuce/Sunflower EST CGPDB project. Data analysis, assembly visualization and validation. Alexander Kozik, Brian Chan, Richard Michelmore. Department.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Motif Search and RNA Structure Prediction Lesson 9.
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
Starter What do you know about DNA and gene expression?
bacteria and eukaryotes
Transcription.
EGASP 2005 Evaluation Protocol
The Transcriptional Landscape of the Mammalian Genome
3. genome analysis.
Gene Regulation and Expression
RNA-Seq analysis in R (Bioconductor)
EGASP 2005 Evaluation Protocol
miRNA genomic organization, biogenesis and function
GENE EXPRESSION AND REGULATION
Lettuce/Sunflower EST CGPDB project.
Predicting RNA Structure and Function
Transcription and Gene Expression
DNA Test Review.
Predicting Active Site Residue Annotations in the Pfam Database
Predicting Genes in Actinobacteriophages
Transcription in Prokaryotic (Bacteria)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Bioinformatics II
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
RNA Chapter 13.1.
The transcription process is similar to replication.
mRNA Degradation and Translation Control
Gracjan Michlewski, Sonia Guil, Colin A. Semple, Javier F. Cáceres 
Volume 11, Issue 4, Pages (April 2018)
Noncoding RNA roles in Gene Expression
DNA, RNA & PROTEINS The molecules of life.
Volume 48, Issue 5, Pages (December 2012)
Volume 19, Issue 1, Pages (July 2005)
Volume 14, Issue 7, Pages (February 2016)
From Mendel to Genomics
Module Recognition Algorithms
Volume 151, Issue 7, Pages (December 2012)
Baekgyu Kim, Kyowon Jeong, V. Narry Kim  Molecular Cell 
LECTURE 5: DNA, RNA & PROTEINS
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Fiona T van den Berg, John J Rossi, Patrick Arbuthnot, Marc S Weinberg 
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Volume 11, Issue 7, Pages (May 2015)
Genome-wide Functional Analysis Reveals Factors Needed at the Transition Steps of Induced Reprogramming  Chao-Shun Yang, Kung-Yen Chang, Tariq M. Rana 
Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick
Presentation transcript:

Identification and Characterization of pre-miRNA Candidates in the C Identification and Characterization of pre-miRNA Candidates in the C. mitchellii Genome I have compiled a library of candidate microRNA sequences in the C. mitchellii genome and analyzed these sequences based on the secondary hairpin structures of their pre-miRNA, then created plots that can be used to analyze this data based on several key parameters.

Why look for miRNAs? transcribed but never translated, form stem loop structures MicroRNAs bind to complimentary seed sites in mRNA in order to block translation or fate the mRNA for destruction Regulation of many biological pathways Important for understanding genome function

Can I identify clusters of candidate miRNAs in the C. mitchelli genome? C. mitchelli (rattlesnake) whole genome shotgun sequence of C. mitchelli assembled in 2013 was obtained from the NCBI genome database. consists of 473,380 contigs, the 50 longest contigs were converted to text files analysed for possible miRNA sequences.

The Approach Candidate miRNA info for each contig miRNAFold Candidates were identified using miRNAFold, an Ab initio miRNA prediction tool.

miRNAFold: an Ab initio miRNA prediction tool output of this program contains the start and stop position on the contig, the size of the miRNA sequence identified, the minimum free energy of the structure, the RNA sequence and two visual representation of the stem-loop structure

The Approach Candidate miRNA info for each contig miRNAFold Text files The result files containing all candidate miRNA data for each of the 50 contigs were manually converted to text files for further computational processing.

The Approach Candidate miRNA info for each contig miRNAFold Text files program stores the text files containing the information for candidate miRNA information into a dictionary and then creates four line plots for each contig which map the miRNAs first part of my program was made in collaboration with Elaine Kushkowski The first step of the program is to import a text file with a list of the contig numbers, which is used to open each contig text file (OpenFile function). Then, for each contig our program stripped the miRNAFold output files of all unnecessary information stored all candidate miRNA information in a dictionary for each contig. Contig dictionary refined by minimum free energy threshold of miRNA hairpin structure

Candidate miRNAs library can be altered based on threshold free energy Examples of candidate miRNA sequence identified based on secondary hairpin structures from C. mitchellii genome minimum free energy values of (a) -20.1 , (b) -60.5 and (c) -96.3 kcal/mol . Candidates were identified using the Ab initio miRNA prediction tool miRNAFold. Green nucleotides represent a complementary Watson-Crick base pairing in the stem, yellow nucleotides represent non-paired segments in the stem, and the blue nucleotides represent the sequence that make up the loop structure. Shows various levels of stability (bottom most stable)

The Approach Candidate miRNA info for each contig miRNAFold Text files Free Energy -60 kcal/mol candidate miRNA dictionaries were then refined to exclude miRNAs that do not fall below a threshold free energy value, which is set manually in the main function (lowest_freeEnergy function). purpose of this function is to be able to exclude miRNAs that have a high free energy and are unstable. narrowed candidate miRNA dictionaries where then stored in an outer dictionary by contig number (makeDictionary function) gives us a dictionary of dictionaries where the miRNA sequences and associated information such as starting indices, ending indices, and free energy are stored according to contig number and starting index keys.

The Approach -60 kcal/mol Candidate miRNA info for each contig miRNAFold Text files Free Energy -60 kcal/mol Plotting second part of the program was done individually For each contig in the dictionary, all candidate miRNAs with a free energy value below -60 kcal/mol were plotted by starting and stopping indices on a line plot. Four plots were created for every contig, one plotted against length of miRNA sequence, one plotted against the free energy value of the miRNA, one plotted against the GC content of each miRNA and one plotted against the miRNA number (Plotline function).

Contig 2426 shows a cluster of candidate miRNAs around 33,000. this cluster has an especially high GC content (0.75 – 0.8) compared to the rest of the miRNAs, and to this section of the genome as a whole (which has a GC content of 0.67). This same cluster has similar lengths and hairpin free energies, which implies that in the GC rich region of the contig, the region that overlaps between all the miRNAs in the cluster forms a stable stem loop structure.

A similar trend is seen in contig 62250 has two very distinct candidate miRNA clusters and several smaller, more dispersed clusters The miRNAs in the cluster around 15,000 have a GC content above 0.7, but vary greatly in length and free energy.

Contig 3109 shows four distinct clusters of miRNAs. This trend is especially apparent on the miRNA number graph, which shows verticle clusters in those regions. These clusters have overlapping regions, but vary widely in length and GC content. Within clusters, their hairpin structure energies are relatively similar. These candidate clusters imply that there is likely a very stable hairpin structure in the overlapping region of each of these clusters. This may imply that theses hairpin structures are more likely to be pre-miRNAs.

The previous three contigs all contained many candidate miRNAs, but many of the contigs contained few if any candidate miRNAs. Contig 62250 contains few miRNAs, but shows a distinct clustering of the candidates that were identified within it. MicroRNAs are commonly located in clusters throughout the genome, as more than one miRNA may be expressed from the same primary miRNA transcript (pri-miRNA), and the clustering observed in this contig follows this pattern. From this data alone, we cannot putatively identify any sequences, but rather provide a starting point for future miRNA identification. Hairpin structure alone is not enough to assign any sort of identification to these sequences, and limiting our results by free energy of these structures may actually exclude possible pre-miRNAs that meet other criteria. this data is based on an analysis of only 0.3% of the C. mitchellii genome, so running this program on more of the contigs could reveal important trends in candidate miRNA distribution. a location based approach to analyzing this dictionary of candidates would narrow results and identify clusters. In the future, once the C. mitchellii genome is annotated, our list of candidate miRNAs can be further refined to those that fall outside the protein coding region.