Intelligent Information System Lab

Intelligent Information System Lab
Deep learning Chapter 10 ( ) Sequence Modelling: Recurrent and Recursive Nets Dinesh Maharjan Intelligent Information System Lab NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image.

Overview 10. Background 10.1 Unfolding Computational Graph
10.2 Recurrent Neural Network Teacher forcing and Networks with output Recurrence

Background Recurrent neural networks or RNNs (Rumelhart et al., 1986a)
Family of neural network for processing sequential data x1, x2, …. xt Can scale to much longer sequences Can process sequence of variable length Shares same weight or parameter across several time steps Parameter sharing is important if same piece of information occur at multiple positions I went to Nepal in 2009. In 2009, I went to Nepal.

Traditional Machine Learning
Take input Evaluate output Evaluate error Adjust the parameters Learn the model Use the model for new inputs Eg: Linear regression, Neural Network

Limitations of Traditional Machine Learning
Cannot learn non linear functions efficiently Inefficient in Image Recognition, Speech processing, Sequence Data generation, Natural Language Processing Assumes independency among input data Cannot learn variable length inputs Takes all input data at a time.

Deep Learning Deep learning combines simpler concepts to build complex concepts For eg to recognize an object, it takes raw data from the object, and extracts the abstract features of object like edges, corners and then finally recognize the object. Deep Learning has breakthrough improvements in Image Classification, Speech Recognition, Natural Language Processing Deep Learning contains 2 or more than two hidden layers Deep network can represent complex functions Nodes at hidden Layer can represent certain features Deep learning architectures are Deep Network, convolutional Network, RNN etc

Limitations of Deep Neural Networks
Cannot model sequence of data or temporal data Assumes that there is not dependency among input data Should be trained with fixed length all input data at a time

10.1 Unfolding Computational Unit
Way to formalize the set of computations Maps inputs and parameters into output. Consider following dynamic system St is the state of the system which is the function of its previous state with some parameter. This is recurrent as the current state refers back to the previous state.

10.1 Unfolding Computational Unit
Dynamic state may have external signal xt also called input Unfolding can be finite number of steps Unfolding can be also represented as directed acyclic computational graph.

10.1 Advantages of Unfolding
Provides an explicit description of computations Provides information flow Regardless of the sequence length, the learned model has the same input size It is possible to use same transition function f with the same parameters at every time step Sharing of parameter allows to learn model with fewer training examples

10.2 Recurrent Neural Network
Recurrent Neural Network is one of the Deep Learning Architecture RNN is used for modelling sequential data or temporal data It consider the hidden state at pervious time step It shares the parameters among input layers, hidden layer and output layer Nowadays extensively used in Language modeling, speech recognition, machine translation, lyrics generation etc

10.2 Design Patterns of RNN Has Output at each time step and recurrent connections between hidden units output units y is target output L is loss that measures how far is o from y, O is output, X is input W is weight matrix between hidden units U is weight matrix between input units and hidden units V is weight matrix between hidden and output units It lacks important past information Easier to train, parallelized, Each time step, gradient is computed in isolation

10.2 Design Patterns of RNN recurrent connections between hidden units
read entire sequence Produce Single output Used to summarize a sequence of data It is many to one design

10.2 Design Patterns of RNN Output at each time step and recurrent connections between hidden units Universal RNN as it can compute any function like Turing machine Idealized model for mathematical calculation Consists of line of cells and an element called head

10.2 Forward Propagation of RNN
Activation function for hidden unit is tanh Softmax produces normalized probabilities of output Total loss, L is the sum of losses over all the time steps i.e. Therefore loss function can be negative log likelihood of total y given x We minimize this function using gradient descent in BPTT

10.2 Forward Propagation of RNN
xt and yt are input and output at time t ht is hidden state at time t Similarly xt+2, ht+2 and yt+2 are inputs, hidden state and output at time t+2 Each hidden state takes input from previous hidden layer also

Teacher forcing Hidden units have connection from actual output from previous time step But during test time of new datasets we don’t know the true output In such case we connect with the predicted output This is called training RNN with teacher forcing As no hidden to hidden connection no need of BPTT

Teacher forcing This allows to maximize the true output given the sequence of inputs Disadvantage is network output may fed back as input To solve this, mitigate the gap between the inputs at training time and inputs at test time. (Bengio et al, 2015b)

잘 들어 주셔서 감사합니다

Intelligent Information System Lab

Similar presentations

Presentation on theme: "Intelligent Information System Lab"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Intelligent Information System Lab

Similar presentations

Presentation on theme: "Intelligent Information System Lab"— Presentation transcript:

Similar presentations

About project

Feedback