Presentation is loading. Please wait.

Presentation is loading. Please wait.

IEEE BIBM 2016 Xu Min, Wanwen Zeng, Ning Chen, Ting Chen*, Rui Jiang*

Similar presentations


Presentation on theme: "IEEE BIBM 2016 Xu Min, Wanwen Zeng, Ning Chen, Ting Chen*, Rui Jiang*"— Presentation transcript:

1 IEEE BIBM 2016 Xu Min, Wanwen Zeng, Ning Chen, Ting Chen*, Rui Jiang*
ISMB/ECCB 2017 Chromatin Accessibility Prediction via Convolutional Long Short-Term Memory Networks with k-mer Embedding Xu Min, Wanwen Zeng, Ning Chen, Ting Chen*, Rui Jiang* Presenter: Xu Min Department of Computer Science and Technology, Tsinghua University, Beijing , China

2 Xu Min, Tsinghua University
IEEE BIBM 2016 Contents Background Method Results 2017/7/24 Xu Min, Tsinghua University

3 Xu Min, Tsinghua University
IEEE BIBM 2016 Contents Background Method Results 2017/7/24 Xu Min, Tsinghua University

4 Xu Min, Tsinghua University
IEEE BIBM 2016 Background What is chromatin accessibility? (Wang et al., 2016) Biological experiments such as DNase-seq, FAIRE-seq, and ATAC-seq, etc. Expensive & time consuming. 2017/7/24 Xu Min, Tsinghua University

5 Xu Min, Tsinghua University
IEEE BIBM 2016 Previous work Computational methods mainly fall into two classes: kmer-based methods e.g. kmer-SVM (Lee et al., 2011), gkm-SVM (Ghandi et al., 2014) Pros: feature sets for arbitrary-length sequences Cons: capture only local motif patterns deep learning-based methods (CNN-based) e.g. DeepBind (Alipanahi et al., 2015), DeepSEA(Zhou and Troyanskaya, 2015), Basset (Kelley et al., 2016), DeepEnhancer (Min et al., 2016) Pros: detect motifs automatically, superior performance Cons: one-hot encoding, require fix-length input 2017/7/24 Xu Min, Tsinghua University

6 Our idea Regard DNA sequences as sentences.
IEEE BIBM 2016 Our idea Regard DNA sequences as sentences. Sentence classification (Kim, 2014) Word embedding: Skip-gram (Mikolov et al., 2013) and GloVe (Pennington et al., 2014) 2017/7/24 Xu Min, Tsinghua University

7 Xu Min, Tsinghua University
IEEE BIBM 2016 Contents Background Method Results 2017/7/24 Xu Min, Tsinghua University

8 Xu Min, Tsinghua University
IEEE BIBM 2016 Our approach Our approach: Convolutional long short-term memory networks with k-mer embedding. Combine two classes of methods: We regard one DNA sequence as a kmer sequence, and train kmer vector by GloVe. We then use convolutional LSTM network for supervised learning to classify input sequences. 2017/7/24 Xu Min, Tsinghua University

9 Xu Min, Tsinghua University
IEEE BIBM 2016 Method Feature learning in three stages: Classification loss function: 2017/7/24 Xu Min, Tsinghua University

10 k-mer embedding with GloVe
IEEE BIBM 2016 k-mer embedding with GloVe Learning embedding representations mainly from the co-occurrence statistics information of k-mers. GloVe model’s cost function: Embedding k-mers by GloVe unsupervised results: 2017/7/24 Xu Min, Tsinghua University

11 Xu Min, Tsinghua University
IEEE BIBM 2016 Bidirectional LSTM Produce a fixed-length output features. Learn long-range relationships of DNA sequences. 2017/7/24 Xu Min, Tsinghua University

12 Xu Min, Tsinghua University
IEEE BIBM 2016 Novelties We fuse the informative k-mer features into a deep neural network by embedding k-mers into a low dimensional vector space. We are able to handle variable-length DNA sequences as input and capture long-distance dependencies thanks to LSTM units. 2017/7/24 Xu Min, Tsinghua University

13 Xu Min, Tsinghua University
IEEE BIBM 2016 Contents Background Method Results 2017/7/24 Xu Min, Tsinghua University

14 Xu Min, Tsinghua University
IEEE BIBM 2016 Experiments setup Dataset: ENCODE DNase-seq experiments for 6 cell lines. Train:validation:test sets 0.85:0.05:0.10 2017/7/24 Xu Min, Tsinghua University

15 Xu Min, Tsinghua University
IEEE BIBM 2016 Experiments setup Unsupervised training of k-mer embedding: Kmer length k=6, Splitting stride s=2 GloVe: window size=15, embedding dimension=100 Supervised deep learning architecture: Keras + Theano RMSprop: learning rate=0.001, batch size=3000 Early stopping: maximum iterations=60, patience=5 2017/7/24 Xu Min, Tsinghua University

16 Xu Min, Tsinghua University
IEEE BIBM 2016 Model Evaluation 2017/7/24 Xu Min, Tsinghua University

17 Xu Min, Tsinghua University
IEEE BIBM 2016 Model Evaluation 2017/7/24 Xu Min, Tsinghua University

18 Visualization of k-mer embedding
IEEE BIBM 2016 Visualization of k-mer embedding 2017/7/24 Xu Min, Tsinghua University

19 Visualization of k-mer embedding
IEEE BIBM 2016 Visualization of k-mer embedding 2017/7/24 Xu Min, Tsinghua University

20 Efficacy of k-mer embedding
IEEE BIBM 2016 Efficacy of k-mer embedding 2017/7/24 Xu Min, Tsinghua University

21 Efficacy of convolution and BLSTM
IEEE BIBM 2016 Efficacy of convolution and BLSTM 2017/7/24 Xu Min, Tsinghua University

22 Xu Min, Tsinghua University
IEEE BIBM 2016 Sensitivity analysis 2017/7/24 Xu Min, Tsinghua University

23 Xu Min, Tsinghua University
IEEE BIBM 2016 Conclusion Main contributions summarized as below: We innovatively introduce an effective embedding representation of input DNA sequences using the unsupervised learning algorithm GloVe in the deep learning framework. We are capable of handling variable-length sequences as input and capturing complex long-range dependencies on them by exploiting the BLSTM network. We prove our model produces state-of-the-art performance in sequence classification tasks, compared to other recent methods including gkmSVM and DeepSEA. 2017/7/24 Xu Min, Tsinghua University

24 Xu Min, Tsinghua University
IEEE BIBM 2016 Reference Alipanahi,B. et al. (2015) Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol., 33(8), 831–838. Ghandi,M. et al. (2014) Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol., 10, e Kelley,D.R. et al. (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res., 26(7), 990–999. Kim,Y. (2014). Convolutional neural networks for sentence classification. In: Conference on Empirical Methods on Natural Language Processing (EMNLP), Association for Computational Linguistics (ACL), pp.1746– 1751. Lee,D. et al. (2011) Discriminative prediction of mammalian enhancers from dna sequence. Genome Res., 21, 2167–2180. Mikolov,T. et al. (2013). Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C. et al. (eds) Advances in Neural Information Processing Systems, NIPS. Curran Associates, NY pp. 3111–3119. Min,X. et al. (2016). DeepEnhancer predicting enhancers by convolutional neural networks. In: IEEE International Conference on Bioinformatics and Biomedicine, IEEE, pp. 637–644. Pennington,J. et al. (2014). GloVe: global vectors for word representation. In: EMNLP, volume 14, p.1532–43. Wang,Y. et al. (2016) Modeling the causal regulatory network by integrating chromatin accessibility and transcriptome data. Natl. Sci. Rev., 3(2), 240–251. Zhou,J., and Troyanskaya,O.G. (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods, 12, 931–934. 2017/7/24 Xu Min, Tsinghua University

25 Xu Min, Tsinghua University
IEEE BIBM 2016 Thank you! Q&A Travel Fellowship Generously Supported by HitSEQ COSI: High Throughput Sequencing Algorithms & Applications 2017/7/24 Xu Min, Tsinghua University


Download ppt "IEEE BIBM 2016 Xu Min, Wanwen Zeng, Ning Chen, Ting Chen*, Rui Jiang*"

Similar presentations


Ads by Google