Download presentation
Presentation is loading. Please wait.
Published byAmberlynn Lucinda Ramsey Modified over 9 years ago
1
The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly Hellenic Institute Pasteur “Athena” Research Center
2
What is Bioinformatics? Bioinformatics is generally defined as the analysis, prediction, modeling and storage of biological data with the help of computers
7
Next Generation Sequencing
9
COSTS
11
The central dogma
12
What are microRNAs (miRNAs)? Gene B Transcription DNA RNA Translation PROTEIN miRNAs are about 22 nt long RNAs. They post-transcriptionally regulate protein coding gene expression
14
MicroRNAs are involved in … Development stem cell proliferation Division Differentiation regulation of innate & adaptive immunity apoptosis cell signaling metabolism apoptosis cell signaling metabolism human pathologies Cancer viral infections cardiovascular diseases metabolic disorders neurological pathologies psychiatric disorders renal disease hepatological conditions psychiatric disorders renal disease hepatological conditions autoimmune diseases gastroenterological conditions obesity reproductive disorders obesity reproductive disorders musculoskeletal disorders periodontal pathologies
15
Superlinear Increase of known miRNAs and relevant Research
16
Active Pathway Visualization
17
Citation:WangD,YanK- K,SisuC,ChengC,RozowskyJ, MeyersonW,etal.(2015)Lor egic:AMethodtoCharacteri zetheCooperativeLogicofR egulatoryFactors.PLoSCom putBiol11(4): e1004132.doi:10.1371/jou rnal.pcbi.1004132
18
Location of miRNAs miR promoter Pol2 exon miR promoter Pol2 70% 30%
19
Why are the pri-miRNA genes not annotated ? Fast degradation in the nucleus Megraw, M., Baev, V., Rusinov, V., Jensen, S.T., Kalantidis, K., Hatzigeorgiou, A.G. MicroRNA promoter element discovery in Arabidopsis (2006) RNA, 12 (9), pp. 1612-1619.
20
Recognition of Transcription Start Sites For pri- microRNA genes Weight matrices of Transcription Factors Chip-Seq data of Pol II occupancy Chip-Seq data of histone modifications (H3K4me3) Cap Analysis of Gene Expression (CAGE)
21
ChIP Sequencing Visualization H3K4me3 Pol2 Drawback: wide range of predictions
22
Experimental identification of miRNA TSS’s Drosha null/conditional-null (Drosha LacZ/e4COIN ) mouse model has been generated using the conditional by inversion (COIN) methodology from Aris Economides @ REGENERON Pharmaceuticals Economides, A.N. et al. Conditionals by inversion provide a universal method for the generation of conditional alleles. Proceedings of the National Academy of Sciences Aug 20;110(34):E3179-88 (2013).
23
Mir17hg Mir92-1 Mir19b-1 Mir20 a Mir19a Mir18 Mir17 GSM973235 WT mESCs 180M reads Drosha -/- mESCs with 27M reads Normalized read count () RNA-seq coverage over the Mir17hg lncRNA locus Drosha +/+ mESCs with 19M reads 8,856 bp RNA-seq read depth is essential!
24
…but ( deep RNA seq is ) not enough miRNAs putative TSS RNA-seq coverage Which one is correct?
25
ChIP-seq information can effectively reduce putative TSS’s miRNAs putative TSS RNA-seq coverage H3K4me3 Pol2 TF footprints
26
Algorithm - First step: identify candidate TSS’s miRNA coding Apply a sliding window around miRNAs mm10 Filter the candidate transcription start sites putative TSS mm10 Raw RNA-seq reads Map reads on the reference genomes mm10 Reads tend to cluster over the expressed genomic regions mm10
27
An algorithm than can learn from examples: machine learning Here we used Support Vector Machines: A supervised machine learning approach. Training with: positive examples (protein coding TSS) negative examples (random intergenic locations, flanking positions) Algorithm - second step: Training of SVMs
28
Algorithm - final step Algorithm overview First step Second step Final step
29
Comparison between microTSS and available algorithms Precision Marson et al S-Peaker PROmiRNA microTSS Distance threshold Algorithms’ Precision and Sensitivity at 1kbp distance threshold from validated TSSs in mESC mESCs (N=47) SensitivityPrecision Marson et al54% (20/37) 64.5% (20/31) PROmiRNA78.7% (37/47) 25.4% (95/373) S-Peaker76.5% (36/47) 18.8% (77/409) microTSS93.6% (44/47) 100% (44/44) No prediction filtering based on distance Predictions located less than 1,000 bp from the validated TSS are considered True Positives and the rest are considered False Positives. Precision = TP / (TP+FP) Sensitivity = Correct Predictions / Total Correct
31
Software on microRNA.gr
32
Maragkakis M, Vergoulis T, Alexiou P, Reczko M et al. DIANA-microT Web server upgrade supports Fly and Worm miRNA target prediction and bibliographic miRNA to disease association. Nucleic Acids Research, 2011. miRNA target predictions (microT) miRNA validated targets (TarBase) miRNA genomics (miRGen) miRNA experimental supported targets on protein coding genes (TarBase) miRNA experimental supported targets on Long Non Coding genes (LincBase) miRNA genomics (miRGen) KEGG pathways analysis (mirPath) miRNA targets gene enrichment analysis (mirExTra) miRNA to disease associations automatic bibliographic searches miRNA naming history analysis extended connectivity to online databases Primary data Meta analysis Other projects of DIANA lab on microrna.gr
33
Database of experimentally supported targets: DIANA-TarBase Initially released in 2006 – The first database to catalog published experimentally validates miRNA:gene interactions With more than 500,000 entries, the largest experimentally validated repository with miRNA:gene interactions Last update DIANA-TarBase v7 http://www.microrna.gr/tarbasehttp://www.microrna.gr/tarbase S. Vlachos, M. D. Paraskevopoulou, D. Karagkouni, G. Georgakilas, T. Vergoulis, I. Kanellos, I-L. Anastasopoulos, S. Maniou, K. Karathanou, D. Kalfakakou, A. Fevgas, T. Dalamagas and A. G. Hatzigeorgiou. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucl. Acids Res. (2014)
34
Semi – Automatic Curation Pipeline Automatic Detection of microRNA related articles Formation of XML-based efficient tree-like structures Detection of microRNA mentions Detection of gene mentions Detection miRNA-gene-interaction triplets Text Scoring Meta-Data insertion and mark-up Score-based ranking and search capabilities
35
Growth of interactions per method Evaluation in Poster # 66
36
http://www.microrna.gr/tarbase
37
Integration in ENSEMBL, the European Browser for Genomes in EBI
38
Long Non Coding RNAs LncBase http://www.microrna.gr/LncBase is the largest available repository of miRNA LNC RNA interactionshttp://www.microrna.gr/LncBase The Experimental Module contains more than 5,000 interactions between 2,958 lncRNAs and 120 miRNAs. The Prediction Module contains detailed information for more than 10 million interactions, between 56,097 lncRNAs and 3,078 miRNAs. Integration into RNAcentral ( EBI ) Paraskevopoulou, M.D., Georgakilas, G., Kostoulas, N., Reczko, M., Maragkakis, M., Dalamagas, T.M., Hatzigeorgiou, A.G. DIANA- LncBase: Experimentally verified and computationally predicted microRNA targets on long non-coding RNAs (2013) Nucleic Acids Research, 41 (D1), pp. D239-D245.
40
miRBase Interconnects also entries with external resources:
41
DIANA-Tools Visit us @ www.microrna.gr! More than 130,000 visits per year, based on Google Analytics! Integration of microT & TarBase in miRBase First release
44
Discussion Check the citations of databases / webservers before publishing For example could be a question added to reviewers : Have the researcher cited properly the data used ? Are the data used for training – testing available ? Can the data be reproduced ? Availability of databases through time – diachronic data Credibility for diachronic databases/web services Funding: Project “TOM” that is implemented under the "ARISTEIA" Action of the "OPERATIONAL PROGRAMME EDUCATION AND LIFELONG LEARNING" and is co-funded by the European Social Fund (ESF) and National Resources.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.