Download presentation
Presentation is loading. Please wait.
Published byRoland Julius Stevens Modified over 6 years ago
1
Deep Learning for Bacteria Event Identification
Jin Mao Postdoc, School of Information, University of Arizona Dec 13th, 2016
2
AGENDA Recurrent Neural Network Bacteria Event Identification Task
Methods by TurkuNLP Results
3
Recurrent Neural Network
A Simple Forward NN
4
Recurrent Neural Network
1. At the time t, the input is formed by concatenating vector w and output from context layer at previous time. 2. The output is the probability of the next word
5
Recurrent Neural Network
RNN At time t, the calculation is based on all the information collected till time t-1. Each time, one word. Each epoch, one iteration. Refer to Mikolov et al. (2010) for more details.
6
Bacteria Event Identification Task
BB3-event 2016 Three types of entity Bacteria Habitat Geographical A single type of event: Lives_in
7
Bacteria Event Identification Task
Lives_in Event Given the text and the annotated entities: T13 Habitat patients with WARI T15 Bacteria Mycoplasma pneumoniae … T18 Bacteria Mycoplasma pneumoniae T19 Habitat school age children with wheezing illness Identify the events/relationships: R1 Lives_In Bacteria:T15 Location:T13 R2 Lives_In Bacteria:T18 Location:T19 Notice: all bacteria and habitat entities will not appear in the events.
8
Methods by TurkuNLP The TEES system(Bj¨orne and Salakoski, 2013):
Preprocessing The TEES system(Bj¨orne and Salakoski, 2013): Tokenization POS tagging, and parsing, remove cross-sentence relations. TEES can extract associations between entities that occur in the same sentence.
9
Shortest Dependency Path
Methods by TurkuNLP Shortest Dependency Path 1, the BLLIP parser (Charniak and Johnson, 2005) with the biomedical domain model created by McClosky (2010). 2, the Stanford conversion tool (de Marneffe et al., 2006) to create dependency graphs 3, use the collapsed variant of the Stanford Dependencies (SD) representation.
10
Shortest Dependency Path
Methods by TurkuNLP Shortest Dependency Path One theory: The syntactic structure connecting two entities is known to contain most of the words relevant to characterizing the relationship R(e1, e2), while excluding less relevant and uninformative words. Typically, the dependency parse is directed, two sub-paths: each from an entity to the common ancestor of the two entities. In this study, treat the dependency structure as an undirected graph. always proceed the path from the BACTERIA entity to the HABITAT/GEOGRAPHICAL entity select the syntactic head of the entity as the starting point
11
Neural Network Architecture
Methods by TurkuNLP Neural Network Architecture RNN: Long Short-Term Memory (LSTM) Three separate RNNs are used. words the POS tags the dependency types
12
Neural Network Architecture
Methods by TurkuNLP 128 dimensions Sigmoid activation function Neural Network Architecture Source: bacteria Target: habitat Input Layer A binary feature, 0-geographical 1-habitat Hidden Layer Output Binary Classification Layer
13
Features and Embeddings
Methods by TurkuNLP Features and Embeddings Word Embeddings: pre-trained from the combined texts of all PubMed titles and abstracts and PubMed Central Open Access (PMC OA) full text articles. (Available from 200-dimensional, word2vec, skip-gram model the vectors of the 100,000 most frequent words out of vocabulary BACTERIA mentions are instead mapped to the vector of the word “bacteria”.
14
Features and Embeddings
Methods by TurkuNLP Features and Embeddings POS Embeddings: 100-dimensional Initialized randomly at the beginning of the training Dependency type embeddings 350 dimensions
15
Methods by TurkuNLP Training Objectives: use binary cross-entropy as the objective function and the Adam optimization algorithm back-propagation 4 epochs L1 and l2 weighting regularizations do little help Dropout on the output of hidden layers
16
Results The limited number of training examples:
Overcoming Variance The limited number of training examples: Initial random state of the model impacts the performance a lot Voting
17
Results Train the model on the training set plus the development set
On test set Train the model on the training set plus the development set Ignored all potential relations between entities belonging to different sentences Low recall.
18
Conclusions The NN model with pre-trained word embedding improves the precision Complicated, a lot of model architectures, and a lot of regurization/training methods, parameters. Future study: Pre-trained POS and dependency type embeddings Different amount of training data Cross sentence problem:create an artificial “paragraph” node connected to all sentence roots
19
This presentation is from: Mehryary, F. , Björne, J. , Pyysalo, S
This presentation is from: Mehryary, F., Björne, J., Pyysalo, S., Salakoski, T., & Ginter, F. (2016). Deep Learning with Minimal Training Data: TurkuNLP Entry in the BioNLP Shared Task 2016. ACL 2016, 73. Reference: Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., & Khudanpur, S. (2010, September). Recurrent neural network based language model. InInterspeech (Vol. 2, p. 3).
20
Thank you!
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.