Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly.

Similar presentations


Presentation on theme: "The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly."— Presentation transcript:

1 The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly Hellenic Institute Pasteur “Athena” Research Center

2 What is Bioinformatics? Bioinformatics is generally defined as the analysis, prediction, modeling and storage of biological data with the help of computers

3

4

5

6

7 Next Generation Sequencing

8

9 COSTS

10

11 The central dogma

12 What are microRNAs (miRNAs)? Gene B Transcription DNA RNA Translation PROTEIN miRNAs are about 22 nt long RNAs. They post-transcriptionally regulate protein coding gene expression

13

14 MicroRNAs are involved in … Development stem cell proliferation Division Differentiation regulation of innate & adaptive immunity apoptosis cell signaling metabolism apoptosis cell signaling metabolism human pathologies Cancer viral infections cardiovascular diseases metabolic disorders neurological pathologies psychiatric disorders renal disease hepatological conditions psychiatric disorders renal disease hepatological conditions autoimmune diseases gastroenterological conditions obesity reproductive disorders obesity reproductive disorders musculoskeletal disorders periodontal pathologies

15 Superlinear Increase of known miRNAs and relevant Research

16 Active Pathway Visualization

17 Citation:WangD,YanK- K,SisuC,ChengC,RozowskyJ, MeyersonW,etal.(2015)Lor egic:AMethodtoCharacteri zetheCooperativeLogicofR egulatoryFactors.PLoSCom putBiol11(4): e1004132.doi:10.1371/jou rnal.pcbi.1004132

18 Location of miRNAs miR promoter Pol2 exon miR promoter Pol2 70% 30%

19 Why are the pri-miRNA genes not annotated ? Fast degradation in the nucleus Megraw, M., Baev, V., Rusinov, V., Jensen, S.T., Kalantidis, K., Hatzigeorgiou, A.G. MicroRNA promoter element discovery in Arabidopsis (2006) RNA, 12 (9), pp. 1612-1619.

20 Recognition of Transcription Start Sites For pri- microRNA genes Weight matrices of Transcription Factors Chip-Seq data of Pol II occupancy Chip-Seq data of histone modifications (H3K4me3) Cap Analysis of Gene Expression (CAGE)

21 ChIP Sequencing Visualization H3K4me3 Pol2 Drawback: wide range of predictions

22 Experimental identification of miRNA TSS’s Drosha null/conditional-null (Drosha LacZ/e4COIN ) mouse model has been generated using the conditional by inversion (COIN) methodology from Aris Economides @ REGENERON Pharmaceuticals Economides, A.N. et al. Conditionals by inversion provide a universal method for the generation of conditional alleles. Proceedings of the National Academy of Sciences Aug 20;110(34):E3179-88 (2013).

23 Mir17hg Mir92-1 Mir19b-1 Mir20 a Mir19a Mir18 Mir17 GSM973235 WT mESCs 180M reads Drosha -/- mESCs with 27M reads Normalized read count () RNA-seq coverage over the Mir17hg lncRNA locus Drosha +/+ mESCs with 19M reads 8,856 bp RNA-seq read depth is essential!

24 …but ( deep RNA seq is ) not enough miRNAs putative TSS RNA-seq coverage Which one is correct?

25 ChIP-seq information can effectively reduce putative TSS’s miRNAs putative TSS RNA-seq coverage H3K4me3 Pol2 TF footprints

26 Algorithm - First step: identify candidate TSS’s miRNA coding Apply a sliding window around miRNAs mm10 Filter the candidate transcription start sites putative TSS mm10 Raw RNA-seq reads Map reads on the reference genomes mm10 Reads tend to cluster over the expressed genomic regions mm10

27 An algorithm than can learn from examples: machine learning Here we used Support Vector Machines: A supervised machine learning approach. Training with: positive examples (protein coding TSS) negative examples (random intergenic locations, flanking positions) Algorithm - second step: Training of SVMs

28 Algorithm - final step Algorithm overview First step Second step Final step

29 Comparison between microTSS and available algorithms Precision Marson et al S-Peaker PROmiRNA microTSS Distance threshold Algorithms’ Precision and Sensitivity at 1kbp distance threshold from validated TSSs in mESC mESCs (N=47) SensitivityPrecision Marson et al54% (20/37) 64.5% (20/31) PROmiRNA78.7% (37/47) 25.4% (95/373) S-Peaker76.5% (36/47) 18.8% (77/409) microTSS93.6% (44/47) 100% (44/44) No prediction filtering based on distance Predictions located less than 1,000 bp from the validated TSS are considered True Positives and the rest are considered False Positives. Precision = TP / (TP+FP) Sensitivity = Correct Predictions / Total Correct

30

31 Software on microRNA.gr

32 Maragkakis M, Vergoulis T, Alexiou P, Reczko M et al. DIANA-microT Web server upgrade supports Fly and Worm miRNA target prediction and bibliographic miRNA to disease association. Nucleic Acids Research, 2011. miRNA target predictions (microT) miRNA validated targets (TarBase) miRNA genomics (miRGen) miRNA experimental supported targets on protein coding genes (TarBase) miRNA experimental supported targets on Long Non Coding genes (LincBase) miRNA genomics (miRGen) KEGG pathways analysis (mirPath) miRNA targets gene enrichment analysis (mirExTra) miRNA to disease associations automatic bibliographic searches miRNA naming history analysis extended connectivity to online databases Primary data Meta analysis Other projects of DIANA lab on microrna.gr

33 Database of experimentally supported targets: DIANA-TarBase Initially released in 2006 – The first database to catalog published experimentally validates miRNA:gene interactions With more than 500,000 entries, the largest experimentally validated repository with miRNA:gene interactions Last update DIANA-TarBase v7 http://www.microrna.gr/tarbasehttp://www.microrna.gr/tarbase S. Vlachos, M. D. Paraskevopoulou, D. Karagkouni, G. Georgakilas, T. Vergoulis, I. Kanellos, I-L. Anastasopoulos, S. Maniou, K. Karathanou, D. Kalfakakou, A. Fevgas, T. Dalamagas and A. G. Hatzigeorgiou. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucl. Acids Res. (2014)

34 Semi – Automatic Curation Pipeline Automatic Detection of microRNA related articles Formation of XML-based efficient tree-like structures Detection of microRNA mentions Detection of gene mentions Detection miRNA-gene-interaction triplets Text Scoring Meta-Data insertion and mark-up Score-based ranking and search capabilities

35 Growth of interactions per method Evaluation in Poster # 66

36 http://www.microrna.gr/tarbase

37 Integration in ENSEMBL, the European Browser for Genomes in EBI

38 Long Non Coding RNAs LncBase http://www.microrna.gr/LncBase is the largest available repository of miRNA LNC RNA interactionshttp://www.microrna.gr/LncBase The Experimental Module contains more than 5,000 interactions between 2,958 lncRNAs and 120 miRNAs. The Prediction Module contains detailed information for more than 10 million interactions, between 56,097 lncRNAs and 3,078 miRNAs. Integration into RNAcentral ( EBI ) Paraskevopoulou, M.D., Georgakilas, G., Kostoulas, N., Reczko, M., Maragkakis, M., Dalamagas, T.M., Hatzigeorgiou, A.G. DIANA- LncBase: Experimentally verified and computationally predicted microRNA targets on long non-coding RNAs (2013) Nucleic Acids Research, 41 (D1), pp. D239-D245.

39

40 miRBase Interconnects also entries with external resources:

41 DIANA-Tools Visit us @ www.microrna.gr! More than 130,000 visits per year, based on Google Analytics! Integration of microT & TarBase in miRBase First release

42

43

44 Discussion Check the citations of databases / webservers before publishing For example could be a question added to reviewers : Have the researcher cited properly the data used ? Are the data used for training – testing available ? Can the data be reproduced ? Availability of databases through time – diachronic data Credibility for diachronic databases/web services Funding: Project “TOM” that is implemented under the "ARISTEIA" Action of the "OPERATIONAL PROGRAMME EDUCATION AND LIFELONG LEARNING" and is co-funded by the European Social Fund (ESF) and National Resources.


Download ppt "The Bytes of biological Data Artemis G. Hatzigeorgiou Professor of Bioinformatics Department of Electrical and Computer Engineering University of Thessaly."

Similar presentations


Ads by Google