NLP.

Libraries for Deep Learning

Matrix Multiplication in Python

Matrix Multiplication in Numpy

Libraries for Deep Learning
Torch (Lua): PyTorch (Python) TensorFlow (Python and C++): Theano (Python) No longer maintained Keras, PaddlePaddle, CNTK

Libraries for Deep Learning: Tensorflow (slides by Jason Chu)

What is TensorFlow? Open source software library for numerical computation using data flow graphs Developed by Google Brain Team for machine learning and deep learning and made open-source TensorFlow provides an extensive suite of functions and classes that allow users to build various models from scratch These slides are adapted from the following Stanford lectures:

What’s a tensor? Formally, tensors are multilinear maps from vector spaces to the real numbers Think of them as n-dimensional array, with 0-d tensors being scalars, 1-d tensor vectors, 2-d tensor matrices, etc

Some Basic Terminology
Dataflow Graphs: entire computation Data Nodes: individual data or operations Edges: implicit dependencies between nodes Operations: any computation Constants: single values (tensors) “TensorFlow programs are usually structured into a construction phase, that assembles a graph, and an execution phase that uses a session to execute ops in the graph.” - TensorFlow docs All nodes return tensors, or higher-dimensional matrices You are metaprogramming. No computation occurs yet!

Data Flow Graphs import tensorflow as tf a = tf.add(2, 3) TF automatically names nodes if you do not x = 2 y = 3 print a >> Tensor("Add:0", shape=(), dtype=int32) Note: a is NOT 5 a

TensorFlow Session Session object encapsulates the environment in which Operation objects are executed and Tensor objects, like a in the previous slide, are evaluated import tensorflow as tf a = tf.add(2, 3) with tf.Session() as sess: print sess.run(a)

TensorFlow Sessions There are 3 arguments for a Session, all of which are optional. target — The execution engine to connect to. graph — The Graph to be launched. config — A ConfigProto protocol buffer with configuration options for the session

TensorFlow Variables “When you train a model you use variables to hold and update parameters. Variables are in-memory buffers containing tensors” - TensorFlow Docs. TensorFlow variables must be initialized before they have values

Placeholders and Feed Dictionaries
You can input data from Numpy using tf.convert_to _tensor, but not scalable Use tf.placeholder variables (dummy nodes that provide entry points for data to computational graph) A feed_dict is a python dictionary mapping from tf. placeholder vars (or their names) to data (numpy arrays, lists, etc.) input1 = tf.placeholder(tf.float32) input2 = tf.placeholder(tf.float32) output = tf.mul(input1, input2) with tf.Session() as sess: print(sess.run([output], feed_dict={input1:[7.], input2:[2.]})))

Variable Scope with tf.variable_scope("foo"):
tf.variable_scope() provides simple name-spacing to avoid clashes of variables with tf.variable_scope("foo"): with tf.variable_scope("bar"): v = tf.get_variable("v", [1]) assert v.name == "foo/bar/v:0“ tf.get_variable() creates/accesses variables from within a variable scope. tf.get_variable_scope().reuse_variables()

Linear Regression Example

Computation Graphs in Tensorflow

Homework 4 Some useful functions:
tf.expand_dims(input, axis=None,name=None,dim=None) Inserts a dimension of 1 to a tensor’s shape t is tensor of shape [2], tf.shape(tf.expand_dims(t, 0)) -> t becomes [1 , 2] tf.gather(params,indices,validate_indices=None,name=None,axis=0) Gathers the elements at the passed-in indices of the given axis of params x = [ 1,2,3,4,3,2,1 ] tf.gather(x, 3).eval()

More functions tf.reduce_sum(input_tensor, axis=None, keepdims=None, name=None, reduction_indices=None, keep_dims=None) Computes the sum of elements across dimensions of a tensor axis: The dimensions to reduce. If None (the default), reduces all dimensions. Must be in the range [-rank(input_tensor), rank(input_tensor)) Ex: x = tf.constant([[1, 1, 1], [1, 1, 1]]) tf.reduce_sum(x) # 6 tf.reduce_sum(x, axis=0) # [2, 2, 2] tf.reduce_sum(x, axis=1) # [3, 3] x = tf.constant([[1, 1, 1], [1, 1, 1]]) tf.reduce_sum(x) -> 6 tf.matmul(a, b,transpose_a=False,transpose_b=False,adjoint_a=False, adjoint_b=False, a_is_sparse=False, b_is_sparse=False, name=None) Multiplies matrix a by b

tf.nn Module Provides functions for neural network support
tf.nn.l2_loss(t, name=None): Computes half the L2 norm of a tensor without the sqrt tf.nn.relu(features, name=None): computes rectified linear unit (ReLU) activation function; f(x)=max(0,x) tf.nn.sparse_softmax_cross_entropy_with_logits(_sentinel=None, labels=None, logits=None, name=None) Computes sparse softmax cross entropy between logits and labels; Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class)

tf.train Module Module for training support; choose an optimizer to perform optimization; many different types of optimizer Class AdamOptimizer Optimizer that implements the Adam algorithm Adam alg can be found here: Class GradientDescentOptimizer Implements the gradient descent algorithm Calling optimizer.minimize() will return an Operation (computation) object Adam alg allows it to use a larger step size than GDOptimizer, so it will converge to that step size without a lot of tuning, but it requires more computation and more state/storage

tf.argmax() tf.argmax( input, axis=None, name=None, dimension=None, output_type=tf.int64 ) Returns the index with the largest value across axes of a tensor

Libraries for Deep Learning: PyTorch (slides by Rui Zhang)

PyTorch Tensor import torch mat1=torch.randn(2,3) mat2=torch.randn(3,3) print mat1 print mat2

Matrix Multiplication in PyTorch
import torch mat1=torch.randn(2,3) mat2=torch.randn(3,3) res=torch.mm(mat1,mat2) print res.size() Output: (2L, 3L)

Batch Matrix Multiplication in PyTorch
import torch batch1=torch.randn(10,3,4) batch2=torch.randn(10,4,5) res=torch.bmm(batch1,batch2) print res.size() Output: (10L, 3L, 5L)

Many Tensor operations in PyTorch……
torch.mm Matrix multiplication torch.bmm Batch matrix multiplication torch.cat Tensor Concatenation torch.sqeueeze/torch.unsqueeze Change Tensor dimensions ….. Check documentation at

PyTorch Variables A PyTorch Variable is a wrapper around a PyTorch Tensor, and represents a node in a computational graph import torch from torch.autograd import Variable #PyTorch Tensor x = torch.ones(2,2) y = torch.ones(2,1) w = torch.randn(2,1) b = torch.randn(1) #PyTorch Variable x = Variable(x, requires_grad=False) y = Variable(y, requires_grad=False) w = Variable(w, requires_grad=True) b = Variable(b, requires_grad=True)

Computational Graphs # Computational Graph p_1 = torch.sigmoid(torch.mm(x, w) + b) # prediction xent = -y * torch.log(p_1) - (1-y) * torch.log(1-p_1) # cross-entropy loss cost = xent.mean() # the cost to minimize

Automatic Gradient Computation
# Computational Graph p_1 = torch.sigmoid(torch.mm(x, w) + b) # prediction xent = -y * torch.log(p_1) - (1-y) * torch.log(1-p_1) # cross-entropy loss cost = xent.mean() # the cost to minimize cost.backward() print w.grad print b.grad

Build Neural Networks using PyTorch
Neural networks can be constructed using the torch.nn package. Forward An nn.Module contains layers, and a method forward(input) that returns the output You can use any of the Tensor operations in the forward function Backward nn depends on autograd to define models and differentiate them You just have to define the forward function, and the backward function (where gradients are computed) is automatically defined for you using autograd

Define a Network Class You don’t need to define a backward function!

CNN for MNIST: A Full Example
Example from

Define a CNN Network Class

Compute Loss input is a random image target is a dummy label

Backpropagation Use torch.optim package to do backpropagation

Links About Deep Learning
AAN: our search engine for resources and papers Richard Socher’s Stanford class

Libraries for Deep Learning: Theano (Slides by Rui Zhang)
(for reference only)

Matrix Multiplication in Theano
import theano import theano.tensor as T Import numpy as np # “symbolic” variables x = T.matrix('x') y = T.matrix(‘y’) dot = T.dot(x, y)

Matrix Multiplication in Theano
import theano import theano.tensor as T Import numpy as np # “symbolic” variables x = T.matrix('x') y = T.matrix(‘y’) dot = T.dot(x, y) #this is the slow part f = theano.function([x,y],[dot]) #now we can use this function a = np.random.random((2,3)) b = np.random.random((3,4)) c = f(a, b) #now a 2 x 4 array

Sigmoid in Theano in = T.vector(‘in’) sigmoid = 1 / (1 + T.exp(-in)) #same as T.nnet.sigmoid sigmoid = T.nnet.sigmoid(x)

Shared Variables vs Symbolic Variables
# This is symbolic x = T.matrix('x') #shared means that it is not symbolic w = theano.shared(np.random.randn(n)) b = theano.shared(0.)

Computational Graph # This is symbolic x = T.matrix('x') #shared means that it is not symbolic w = theano.shared(np.random.randn(n)) b = theano.shared(0.) # Computational Graph p_1 = sigmoid(T.dot(x, w) + b) xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy cost = xent.mean() # The cost to minimize

Automatic Gradient Computation
p_1 = sigmoid(T.dot(x, w) + b) xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy cost = xent.mean() # The cost to minimize gw, gb = T.grad(cost, [w, b])

Compile a Function train = theano.function( inputs=[x,y],
outputs=[prediction, xent], updates=((w, w * gw), (b, b * gb)))

Computation Graphs in Theano

LSTM Sentiment Analysis Demo
If you’re new to deep learning and want to work with Theano, do yourself a favor and work through A LSTM demo is described here: Sentiment analysis model trained on IMDB movie reviews

LSTMs: One Time Step f1 i1 o1 x1 h0 c1 σ c0 + h1 ~ tanh
This is the first LSTM cell in a sequence. X1 is the first input, in this case a word vector for the first word in the sequence. I1 is our input gate, f1 is our forget gate, and o1 is our output gate. So we take our input hidden vector h and our input x, apply some weights to them, and take the sigmoid. That gives us our candidate c for the current time step. We then pass that candidate through the input gate, and we pass c from the previous timestep through the forget gate, and their sum is our actual C for this timestep. We use a tanh function on C1 and pass the result through the output gate to give us our h for this timestep. ~ [Slides from Catherine Finegan-Dollak]

LSTMs: Building a Sequence
The cat sat on …

Theano Implementation of an LSTM Step
(lstm.py, L. 174) def _step(m_, x_, h_, c_): preact = tensor.dot(h_, tparams[_p(prefix, 'U')]) preact += x_ i = tensor.nnet.sigmoid(_slice(preact, 0, options['dim_proj'])) f = tensor.nnet.sigmoid(_slice(preact, 1, options['dim_proj'])) o = tensor.nnet.sigmoid(_slice(preact, 2, options['dim_proj'])) c = tensor.tanh(_slice(preact, 3, options['dim_proj'])) c = f * c_ + i * c c = m_[:, None] * c + (1. - m_)[:, None] * c_ h = o * tensor.tanh(c) h = m_[:, None] * h + (1. - m_)[:, None] * h_ return h, c “preact” is the sum of Wx with the dot product of the previous step’s h with the weight matrix U; U concatenates Ui, Uf, Uo, and Uc, for computational efficiency; W does the same with all the W matrices. Then the _slice function splits the dot product back out again to generate the three gates, i, f, and o, and the candidate 𝐶 . m_ is a mask, used for dealing with variable-length input.

theano.scan iterates through a series of steps
rval, updates = theano.scan(_step, sequences=[mask, state_below], outputs_info=[tensor.alloc(numpy_floatX(0.), n_samples, dim_proj), tensor.alloc(numpy_floatX(0.), n_samples, dim_proj)], name=_p(prefix, '_layers'), n_steps=nsteps) (lstm.py, L. 195)

NLP.

Similar presentations

Presentation on theme: "NLP."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

NLP.

Similar presentations

Presentation on theme: "NLP."— Presentation transcript:

Similar presentations

About project

Feedback