Deep Learning in Bioinformatics

Slides:



Advertisements
Similar presentations
Basic biology: A Review. Which half are you? Half of you will already know >90% of this material-- your challege will be to stay awake enough to catch.
Advertisements

Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Genetics and the Organism 10 Jan, Genetics Experimental science of heredity Grew out of need of plant and animal breeders for greater understanding.
The Central Dogma of Molecular Biology (Things are not really this simple) Genetic information is stored in our DNA (~ 3 billion bp) The DNA of a.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
CISC667, F05, Lec27, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Review Session.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
CSE 6406: Bioinformatics Algorithms. Course Outline
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Molecular Biology Primer for CS and engineering students Alan Qi Jan. 10, 2008.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Intelligent Systems for Bioinformatics Michael J. Watts
Finish up array applications Move on to proteomics Protein microarrays.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Bioinformatics and Computational Biology
Introduction to Bioinformatics Algorithms Algorithms for Molecular Biology CSCI Elizabeth White
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Motif Search and RNA Structure Prediction Lesson 9.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Finding genes in the genome
Starter What do you know about DNA and gene expression?
Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
Gene regulation biology 1 lecture 13. Differential expression of genetic code in prokaryotes and eukaryotes Regulation at the transcription level How.
1 CISC 841 Bioinformatics (Fall 2008) Review Session.
CISC667, S07, Lec25, Liao1 CISC 467/667 Intro to Bioinformatics (Spring 2007) Review Session.
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
Projects
David Amar, Tom Hait, and Ron Shamir
Summary -Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological.
Biotechnology.
Sungkyunkwan University, School of Medicine.
CS273B: Deep learning for Genomics and Biomedicine
IEEE BIBM 2016 Xu Min, Wanwen Zeng, Ning Chen, Ting Chen*, Rui Jiang*
9th Annual "Humies" Awards 2012 — Philadelphia, Pennsylvania
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
Characterization of Transition Metal-Sensing Riboswitches
Whole-cell models: combining genomics and dynamical modeling
Functional Mapping and Annotation of GWAS: FUMA
Babak Alipanahi1, Andrew Delong, Matthew T Weirauch & Brendan J Frey
Motifs BCH364C/394P - Systems Biology / Bioinformatics
Intelligent Information System Lab
Gene expression estimation from RNA-Seq data
Principles of using neural networks for predicting molecular traits from DNA sequence Principles of using neural networks for predicting molecular traits.
Recitation 7 2/4/09 PSSMs+Gene finding
Genomes and Their Evolution
Albert Xue, Binbin Huang, Jianrong Wang
Genome organization and Bioinformatics
Introduction to Bioinformatics II
Relationship between Genotype and Phenotype
AH Biology: Unit 1 Proteomics and Protein Structure 1
Yang Liu, Perry Palmedo, Qing Ye, Bonnie Berger, Jian Peng 
Presented by, Jeremy Logue.
Summarized by Sun Kim SNU Biointelligence Lab.
Presented by, Jeremy Logue.
Motifs BCH339N Systems Biology / Bioinformatics – Spring 2016
Presentation transcript:

Deep Learning in Bioinformatics Asmitha Rathis

Why Bioinformatics? Protein structure Genetic Variants Anomaly classification Protein classification Segmentation/Splicing

Why is Deep Learning beneficial? scalable with large datasets and are effective in identifying complex patterns from feature-rich datasets learn high levels of abstractions from multiple layers of non-linear transformations.

Terms What are Motifs? What is non-coding DNA? short, recurring patterns in DNA that are presumed to have a biological function What is non-coding DNA?  DNA that do not encode protein sequences. 

Papers DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences - Daniel Quang and Xiaohui Xie [2016] Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning - Babak Alipanahi et al [2015] Exploiting the past and the future in protein secondary structure prediction - Pierre Bald et al [1999]

DanQ:a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences A predictive model for the function of non-coding DNA has enormous benefit for translation research 98% of human genome is non coding DNA and 93% of disease variants lie in this region Previous work: DeepSea model Propose a novel hybrid convolutional and bi-directional long short- term memory recurrent neural network framework

Network Model Convolution for motifs Recurrent layer for capturing dependency between the motifs and grammar

Training Details Random initialization and initialize kernels from known motifs Dropout is included RMSprop algorithm with a minibatch size of 100 60 epochs to fully train and each epoch of training takes ∼6 h

Results Calculated ROC for each of the 919 binary targets on the test set Predicted probability was the average of the forward and reverse complement sequence pairs

Results Precision recall curve

Future Work Better initialization techniques Half are initialized with known motifs from JASPAR dataset Datasets from more cell types

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning: DeepBind DNA- and RNA-binding proteins play a central role in gene regulation, including transcription and alternative splicing. In the field of transcription, sequence specificity of DNA usually means how specific a protein, usually a transcription factor, recognizes its target DNA motif.

Challenges Data come in qualitatively different forms, eg: microarray and sequencing data Quantity is very large Need to overcome the biases of existing technologies

Data For training, DeepBind uses a set of sequences and, for each sequence, an experimentally determined binding score.

Binding score :

Training/Testing Details training on in vitro data and testing on in vivo data. vitro : refers to the technique of performing a given procedure in a controlled environment outside of a living organism Vivo : tested on whole, living organisms or cells, usually animals, including humans, and plants,

Results

Analysis of potentially disease-causing genomic variants Use binding models to identify, group and visualize variants that potentially change protein binding Importance of each base based on the height of the letter The mutation map indicating how much each possible mutation will increase or decrease the binding score. A cancer risk variant in a MYC enhancer weakens a TCF7L2 binding site.

Analysis of Splicing Patterns

Exploiting the past and the future in protein secondary structure prediction Predicting the secondary structure of a protein (alpha-helix, beta sheet, coil) is an important step towards understanding its three dimensional structure as well as its function. Old methods : ML models that don’t capture variable long ranged information, Increasing size of window leads to overfitting

Results

Results Overall performance close to 76% correct classification with 6 BRNNs Use a range to limit the size of the window Size of window

Questions Based on the more recent models and technologies seen in class, which of them can be applied to these problems? Can these techniques be applied to other bioinformatics tasks?