Word Embedding

Nice explantion on Word Enbedding and

https://devblogs.nvidia.com/parallelforall/understanding-natural-language-deep-neural-networks-using-torch/

Word embeddings are not unique to neural networks; they are common to all word-level neural language models. Embeddings are stored in a simple lookup table (or hash table), that given a word, returns the embedding (which is an array of numbers). Figure 1 (check in ref link) shows an example.

Word embeddings are usually initialized to random numbers (and learned during the training phase of the neural network), or initialized from previously trained models over large texts like Wikipedia.

Feed-forward Convolutional Neural Networks

Convolutional Neural Networks (ConvNets), which were covered in a previous Parallel Forall post by Evan Shelhamer, have enjoyed wide success in the last few years in several domains including images, video, audio and natural language processing.

When applied to images, ConvNets usually take raw image pixels as input, interleaving convolution layers along with pooling layers with non-linear functions in between, followed by fully connected layers. Similarly, for language processing, ConvNets take the outputs of word embeddings as input, and then apply interleaved convolution and pooling operations, followed by fully connected layers. Figure 2 shows an example ConvNet applied to sentences.

Convolutional Neural Networks—and more generally, feed-forward neural networks—do not traditionally have a notion of time or experience unless you explicitly pass samples from the past as input. After they are trained, given an input, they treat it no differently when shown the input the first time or the 100th time. But to tackle some problems, you need to look at past experiences and give a different answer.

Recurrent Neural Networks (RNN)

Convolutional Neural Networks—and more generally, feed-forward neural networks—do not traditionally have a notion of time or experience unless you explicitly pass samples from the past as input. After they are trained, given an input, they treat it no differently when shown the input the first time or the 100th time. But to tackle some problems, you need to look at past experiences and give a different answer.

If you send sentences word-by-word into a feed-forward network, asking it to predict the next word, it will do so, but without any notion of the current context. The animation in Figure 3 shows why context is important. Clearly, without context, you can produce sentences that make no sense. You can have context in feed-forward networks, but it is much more natural to add a recurrent connection.

A Recurrent neural network has the capability to give itself feedback from past experiences. Apart from all the neurons in the network, it maintains a hidden state that changes as it sees different inputs. This hidden state is analogous to short-term memory. It remembers past experiences and bases its current answer on both the current input as well as past experiences. An illustration is shown in Figure 4(check in ref link ).

Long Short Term Memory (LSTM)

RNNs keep context in their hidden state (which can be seen as memory). However, classical recurrent networks forget context very fast. They take into account very few words from the past while doing prediction. Here is an example of a language modelling problem that requires longer-term memory.

I bought an apple … I am eating the _____

The probability of the word “apple” should be much higher than any other edible like “banana” or “spaghetti”, because the previous sentence mentioned that you bought an “apple”. Furthermore, any edible is a much better fit than non-edibles like “car”, or “cat”.

Long Short Term Memory (LSTM) [6] units try to address the problem of such long-term dependencies. LSTM has multiple gates that act as a differentiable RAM memory. Access to memory cells is guarded by “read”, “write” and “erase” gates. Information stored in memory cells is available to the LSTM for a much longer time than in a classical RNN, which allows the model to make more context-aware predictions. An LSTM unit is shown in Figure 5.

Exactly how LSTM works is unclear, and fully understanding it is a topic of contemporary research. However, it is known that LSTM outperforms conventional RNNs on many tasks.

Torch + cuDNN + cuBLAS: Implementing ConvNets and Recurrent Nets efficiently

Torch is a scientific computing framework with packages for neural networks and optimization (among hundreds of others). It is based on the Lua language, which is similar to javascript and is treated as a wrapper for optimized C/C++ and CUDA code.

At the core of Torch is a powerful tensor library similar to Numpy. The Torch tensor library has both CPU and GPU backends. The neural networks package in torch implements modules, which are different kinds of neuron layers, and containers, which can have several modules within them. Modules are like Lego blocks, and can be plugged together to form complicated neural networks.

Each module implements a function and its derivative. This makes it easy to calculate the derivative of any neuron in the network with respect to the objective function of the network (via the chain rule). The objective function is simply a mathematical formula to calculate how well a model is doing on the given task. Usually, the smaller the objective, the better the model performs.

The following small example of modules shows how to calculate the element-wise Tanh of an input matrix, by creating an nn.Tanh module and passing the input through it. We calculate the derivative with respect to the objective by passing it in the backward direction.

input = torch.randn(100)
m = nn.Tanh()
output = m:forward(input)
InputDerivative = m:backward(input, ObjectiveDerivative)

Implementing the ConvNet shown in Figure 2 is also very simple with Torch. In this example, we put all the modules into a Sequential container that chains the modules one after the other.

nWordsInDictionary = 100000
embeddingSize = 100
sentenceLength = 5
m = nn.Sequential() -- a container that chains modules one after another
m:add(nn.LookupTable(nWordsInDictionary, embeddingSize))
m:add(nn.TemporalConvolution(sentenceLength, 150, embeddingSize))
m:add(nn.Max(1))
m:add(nn.Linear(150, 1024))
m:add(nn.HardTanh())
m:add(nn.Linear())
 
m:cuda() -- transfer the model to GPU

This ConvNet has :forward and :backward functions that allow you to train your network (on CPUs or GPUs). Here we transfer it to the GPU by calling m:cuda().

An extension to the nn package is the nngraph package which lets you build arbitrary acyclic graphs of neural networks. nngraph makes it easier to build complicated modules such as the LSTM memory unit, as the following example code demonstrates.

local function lstm(i, prev_c, prev_h)
  local function new_input_sum()
    local i2h            = nn.Linear(params.rnn_size, params.rnn_size)
    local h2h            = nn.Linear(params.rnn_size, params.rnn_size)
    return nn.CAddTable()({i2h(i), h2h(prev_h)})
  end
  local in_gate          = nn.Sigmoid()(new_input_sum())
  local forget_gate      = nn.Sigmoid()(new_input_sum())
  local in_gate2         = nn.Tanh()(new_input_sum())
  local next_c           = nn.CAddTable()({
    nn.CMulTable()({forget_gate, prev_c}),
    nn.CMulTable()({in_gate,     in_gate2})
  })
  local out_gate         = nn.Sigmoid()(new_input_sum())
  local next_h           = nn.CMulTable()({out_gate, nn.Tanh()(next_c)})
  return next_c, next_h
end

With these few lines of code we can create powerful state-of-the-art neural networks, ready for execution on CPUs or GPUs with good efficiency.

cuBLAS, and more recently cuDNN, have accelerated deep learning research quite significantly, and the recent success of deep learning can be partly attributed to these awesome libraries from NVIDIA. [Learn more about cuDNN here!] cuBLAS is automatically used by Torch for performing BLAS operations such as matrix multiplications, and accelerates neural networks significantly compared to CPUs.

To use NVIDIA cuDNN in Torch, simply replace the prefix nn. with cudnn.. cuDNN accelerates the training of neural networks compared to Torch’s default CUDA backend (sometimes up to 30%) and is often several orders of magnitude faster than using CPUs.

For language modeling, we’ve implemented an RNN-LSTM neural network [9] using Torch. It gives state-of-the-art results on a standard quality metric called perplexity. The full source of this implementation is available here.

We compare the training time of the network on an Intel Core i7 2.6 GHZ vs accelerating it on an NVIDIA GeForce GTX 980 GPU. Table 2 shows the training times and GPU speedups for a small RNN and a larger RNN.

Table 2: Training times of a state-of-the-art recurrent network with LSTM cells on CPU vs GPU.

Conventional Neural Network

Figure 1: Conventional Neural Network

2.1 Lookup Table

The idea of distributed representation for symbolic data is one of the most important reasons why the neural network works. It was proposed by Hinton [11] and has been a research hot spot for more than twenty years [1, 6, 21, 16]. Formally, in the Chinese word segmentation task, we have a character dictionary D of size |D|. Unless otherwise specified, the character dictionary is extracted from the training set and unknown characters are mapped to a special symbol that is not used elsewhere. Each character c∈D is represented as a real-valued vector (character embedding) E⁢m⁢b⁢e⁢d⁢(c)∈ℝd where d is the dimensionality of the vector space. The character embeddings are then stacked into a embedding matrix M∈ℝd×|D|. For a character c∈D that has an associated index k, the corresponding character embedding E⁢m⁢b⁢e⁢d⁢(c)∈ℝd is retrieved by the Lookup Table layer as shown in Figure 1:

E⁢m⁢b⁢e⁢d⁢(c)=M⁢ek (1)

Here ek∈ℝ|D| is a binary vector which is zero in all positions except at k-th index. The Lookup Table layer can be seen as a simple projection layer where the character embedding for each context character is achieved by table lookup operation according to their indices. The embedding matrix M is initialized with small random numbers and trained by back-propagation. We will analyze in more detail about the effect of character embeddings in Section 4.

2.2 Tag Scoring

The most common tagging approach is the window approach. The window approach assumes that the tag of a character largely depends on its neighboring characters. Given an input sentence c[1:n], a window of size w slides over the sentence from character c1 to cn. We set w=5 in all experiments. As shown in Figure 1, at position ci,1≤i≤n, the context characters are fed into the Lookup Table layer. The characters exceeding the sentence boundaries are mapped to one of two special symbols, namely “start” and “end” symbols. The character embeddings extracted by the Lookup Table layer are then concatenated into a single vectora∈ℝH1, where H1=w⋅d is the size of Layer 1. Then a is fed into the next layer which performs linear transformation followed by an element-wise activation function g such as tanh, which is used in our experiments:

h=g⁢(W1⁢a+b1) (2)

where W1∈ℝH2×H1, b1∈ℝH2×1, h∈ℝH2. H2 is a hyper-parameter which is the number of hidden units in Layer 2. Given a set of tags T of size |T|, a similar linear transformation is performed except that no non-linear function is followed:

f(t|c[i-2:i+2])=W2h+b2 (3)

where W2∈ℝ|T|×H2, b2∈ℝ|T|×1. f(t|c[i-2:i+2])∈ℝ|T| is the score vector for each possible tag. In Chinese word segmentation, the most prevalent tag set T is BMES tag set, which uses 4 tags to carry word boundary information. It uses B, M, E and S to denote the Beginning, the Middle, the End of a word and a Single character forming a word respectively. We use this tag set in our method.

2.3 Model Training and Inference

Despite sharing commonalities mentioned above, previous work models the segmentation task differently and therefore uses different training and inference procedure. Mansur et al. [15] modeled Chinese word segmentation as a series of classification task at each position of the sentence in which the tag score is transformed into probability using softmax function:

p(ti|c[i-2:i+2])=exp(f(ti|c[i-2:i+2]))∑t′exp(f(t′|c[i-2:i+2]))

The model is then trained in MLE-style which maximizes the log-likelihood of the tagged data. Obviously, it is a local model which cannot capture the dependency between tags and does not support to infer the tag sequence globally.

To model the tag dependency, previous neural network models [6, 35] introduce a transition score Ai⁢j for jumping from tag i∈T to tag j∈T. For a input sentence c[1:n] with a tag sequence t[1:n], a sentence-level score is then given by the sum of transition and network scores:

s(c[1:n],t[1:n],θ)=∑i=1n(Ati-1⁢ti+fθ(ti|c[i-2:i+2])) (4)

where fθ(ti|c[i-2:i+2]) indicates the score output for tag ti at the i-th character by the network with parameters θ=(M,A,W1,b1,W2,b2). Given the sentence-level score, Zheng et al. [35] proposed a perceptron-style training algorithm inspired by the work of Collins [5]. Compared with Mansur et al. [15], their model is a global one where the training and inference is performed at sentence-level.

Workable as these methods seem, one of the limitations of them is that the tag-tag interaction and the neural network are modeled seperately. The simple tag-tag transition neglects the impact of context characters and thus limits the ability to capture flexible interactions between tags and context characters. Moreover, the simple non-linear transformation in equation (2) is also poor to model the complex interactional effects in Chinese word segmentation.

 Ref:

https://www.aclweb.org/anthology/P/P14/P14-1028.xhtml

 

 

 

 

 

 

require "rnn"
require "cunn"

torch.manualSeed(123)

batch_size= 2
maxLen = 4
wordVec = 5
nWords = 100
mode = 'CPU'

-- create random data with zeros as empty indicator
inp1 = torch.ceil(torch.rand(batch_size, maxLen)*nWords) -- 
labels = torch.ceil(torch.rand(batch_size)*2) -- create labels of 1s and 2s

-- not all sequences have the same lenght, 0 placeholder
for i=1, batch_size do
    n_zeros = torch.random(maxLen-2) 
    inp1[{{i},{1, n_zeros}}] = torch.zeros(n_zeros)
end

-- make the first sequence the same as the second
inp1[{{2},{}}] = inp1[{{1},{}}]:clone()


lstm = nn.Sequential()
lstm:add(nn.LookupTableMaskZero(10000, wordVec, batch_size))  -- convert indices to word vectors
lstm:add(nn.SplitTable(1))  -- convert tensor to list of subtensors
lstm:add(nn.Sequencer(nn.MaskZero(nn.LSTM(wordVec, wordVec), 1))) -- Seq to Seq', 0-Seq to 0-Seq

if mode == 'GPU' then
    lstm:cuda()
    criterion:cuda()
    labels = labels:cuda()
    inp1 = inp1:cuda()
end

out = lstm:forward(inp1)

print('input 1', inp1[1])
print('lstm out 1', out[1])  


print('input 2', inp1[2])  -- shoudl be the same as above
print('lstm out 2', out[2])  --  should be the same as above

REF

http://cseweb.ucsd.edu/~dasgupta/254-deep/stefanos.pdf

Natural language understanding (almost) from scratch

http://resola.ai/dev/

https://iksinc.wordpress.com/tag/continuous-bag-of-words-cbow/

 

Screen Shot 2015-04-10 at 4.16.00 PM

The final layer of the network has one node for each candidate tag, each output is interpreted as the score for the associated tag.

 

 

What is a word vector?

At one level, it’s simply a vector of weights. In a simple 1-of-N (or ‘one-hot’) encoding every element in the vector is associated with a word in the vocabulary. The encoding of a given word is simply the vector in which the corresponding element is set to one, and all other elements are zero.

Suppose our vocabulary has only five words: King, Queen, Man, Woman, and Child. We could encode the word ‘Queen’ as:

Using such an encoding, there’s no meaningful comparison we can make between word vectors other than equality testing.

In word2vec, a distributed representation of a word is used. Take a vector with several hundred dimensions (say 1000). Each word is representated by a distribution of weights across those elements. So instead of a one-to-one mapping between an element in the vector and a word, the representation of a word is spread across all of the elements in the vector, and each element in the vector contributes to the definition of many words.

If I label the dimensions in a hypothetical word vector (there are no such pre-assigned labels in the algorithm of course), it might look a bit like this:

Such a vector comes to represent in some abstract way the ‘meaning’ of a word. And as we’ll see next, simply by examining a large corpus it’s possible to learn word vectors that are able to capture the relationships between words in a surprisingly expressive way. We can also use the vectors as inputs to a neural network.

Reasoning with word vectors

We find that the learned word representations in fact capture meaningful syntactic and semantic regularities in a very simple way. Specifically, the regularities are observed as constant vector offsets between pairs of words sharing a particular relationship. For example, if we denote the vector for word i as xi, and focus on the singular/plural relation, we observe that xapple – xapples ≈ xcar– xcars, xfamily – xfamilies ≈ xcar – xcars, and so on. Perhaps more surprisingly, we find that this is also the case for a variety of semantic relations, as measured by the SemEval 2012 task of measuring relation similarity.

The vectors are very good at answering analogy questions of the form a is to b as cis to ?. For example, man is to woman as uncle is to ? (aunt) using a simple vector offset method based on cosine distance.

For example, here are vector offsets for three word pairs illustrating the gender relation:

Ref

The amazing power of word vectors

Word Embedding Code In torch

 

lookuptable
self.llstm = LSTM
self.rlstm = LSTM

local modules = nn.Parallel()
  :add(nn.LookupTable(self.vocab_size, self.emb_size))
  :add(nn.Collapse(2))
  :add(self.llstm)
  :add(self.my_module)

self.params, self.grad_params = modules:getParameters

ref
http://stackoverflow.com/questions/37126328/how-to-use-nn-lookuptable-in-torch
 

Multiple batches LSTM

ref

https://github.com/Element-Research/rnn/issues/74



require "rnn"
require "cunn"

torch.manualSeed(123)

batch_size= 2
maxLen = 4
wordVec = 5
nWords = 100
mode = 'CPU'

-- create random data with zeros as empty indicator
inp1 = torch.ceil(torch.rand(batch_size, maxLen)*nWords) -- 
labels = torch.ceil(torch.rand(batch_size)*2) -- create labels of 1s and 2s

-- not all sequences have the same lenght, 0 placeholder
for i=1, batch_size do
    n_zeros = torch.random(maxLen-2) 
    inp1[{{i},{1, n_zeros}}] = torch.zeros(n_zeros)
end

-- make the first sequence the same as the second
inp1[{{2},{}}] = inp1[{{1},{}}]:clone()


lstm = nn.Sequential()
lstm:add(nn.LookupTableMaskZero(10000, wordVec, batch_size))  -- convert indices to word vectors
lstm:add(nn.SplitTable(1))  -- convert tensor to list of subtensors
lstm:add(nn.Sequencer(nn.MaskZero(nn.LSTM(wordVec, wordVec), 1))) -- Seq to Seq', 0-Seq to 0-Seq

if mode == 'GPU' then
    lstm:cuda()
    criterion:cuda()
    labels = labels:cuda()
    inp1 = inp1:cuda()
end

out = lstm:forward(inp1)

print('input 1', inp1[1])
print('lstm out 1', out[1])  


print('input 2', inp1[2])  -- shoudl be the same as above
print('lstm out 2', out[2])  --  should be the same as above

 

 


 

sequence-to-sequence networks.

ref https://github.com/Element-Research/rnn/issues/155


--[[ Example of "coupled" separate encoder and decoder networks, e.g.
-- for sequence-to-sequence networks. ]]--
require 'rnn'

version = 1.2 -- refactored numerical gradient test into unit tests. Added training loop

local opt = {}
opt.learningRate = 0.1
opt.hiddenSize = 6
opt.vocabSize = 5
opt.seqLen = 3 -- length of the encoded sequence
opt.niter = 1000

--[[ Forward coupling: Copy encoder cell and output to decoder LSTM ]]--
local function forwardConnect(encLSTM, decLSTM,seqLen)
   decLSTM.userPrevOutput = nn.rnn.recursiveCopy(decLSTM.userPrevOutput, encLSTM.outputs[seqLen])
   decLSTM.userPrevCell = nn.rnn.recursiveCopy(decLSTM.userPrevCell, encLSTM.cells[seqLen])
end

--[[ Backward coupling: Copy decoder gradients to encoder LSTM ]]--
local function backwardConnect(encLSTM, decLSTM)
   encLSTM.userNextGradCell = nn.rnn.recursiveCopy(encLSTM.userNextGradCell, decLSTM.userGradPrevCell)
   encLSTM.gradPrevOutput = nn.rnn.recursiveCopy(encLSTM.gradPrevOutput, decLSTM.userGradPrevOutput)
end

-- Encoder
local enc = nn.Sequential()
enc:add(nn.LookupTableMaskZero(opt.vocabSize, opt.hiddenSize))
enc:add(nn.SplitTable(1, 2)) -- works for both online and mini-batch mode
local encLSTM = nn.LSTM(opt.hiddenSize, opt.hiddenSize):maskZero(1)
enc:add(nn.Sequencer(encLSTM))
enc:add(nn.SelectTable(-1))

-- Decoder
local dec = nn.Sequential()
dec:add(nn.LookupTableMaskZero(opt.vocabSize, opt.hiddenSize))
dec:add(nn.SplitTable(1, 2)) -- works for both online and mini-batch mode
local decLSTM = nn.LSTM(opt.hiddenSize, opt.hiddenSize):maskZero(1)
dec:add(nn.Sequencer(decLSTM))
dec:add(nn.Sequencer(nn.MaskZero(nn.Linear(opt.hiddenSize, opt.vocabSize),1)))
dec:add(nn.Sequencer(nn.MaskZero(nn.LogSoftMax(),1)))
-- dec = nn.MaskZero(dec,1)

local criterion = nn.SequencerCriterion(nn.MaskZeroCriterion(nn.ClassNLLCriterion(),1))

-- Some example data (batchsize = 2)
local encInSeq1 = torch.Tensor({{1,2,3},{3,2,1}}) 
local decInSeq1 = torch.Tensor({{1,2,3,4},{2,4,3,1}})
local decOutSeq1 = torch.Tensor({{2,3,4,1},{4,3,1,2}})
decOutSeq1 = nn.SplitTable(1, 1):forward(decOutSeq1)
local encInSeq = torch.Tensor({{1,1,1,2,3},{0,0,1,2,3},{0,0,3,2,1}}) 
local decInSeq = torch.Tensor({{1,1,1,1,2,3},{1,2,3,4,0,0},{2,4,3,1,0,0}})
local decOutSeq = torch.Tensor({{1,1,1,2,3,2},{2,3,4,1,0,0},{4,3,1,2,0,0}})
decOutSeq = nn.SplitTable(1, 1):forward(decOutSeq)
print(decOutSeq)


print('encoder:')
for i,module in ipairs(enc:listModules()) do
  print(module)
  break
end
print('decoder:')
for i,module in ipairs(dec:listModules()) do
  print(module)
  break
end
local function train(i,encInSeq, decInSeq,decOutSeq)

   -- Forward pass
   local len = encInSeq:size(2)
   -- print(len)
   local encOut = enc:forward(encInSeq)
   forwardConnect(encLSTM, decLSTM,len)
   local decOut = dec:forward(decInSeq)
   -- print("decout:")
   -- for i = 1,#decOut do
     -- print(decOut[i])
   -- end
   local err = criterion:forward(decOut, decOutSeq)
   -- print(err) 
   print(string.format("Iteration %d ; NLL err = %f ", i, err))

   -- Backward pass

   local gradOutput = criterion:backward(decOut, decOutSeq)
   dec:backward(decInSeq, gradOutput)
   backwardConnect(encLSTM, decLSTM)
   local zeroTensor = torch.Tensor(2):zero()
   enc:backward(encInSeq, zeroTensor)

   dec:updateParameters(opt.learningRate)
   enc:updateParameters(opt.learningRate)
   enc:zeroGradParameters()
   dec:zeroGradParameters()
   dec:forget()
   enc:forget()
   encLSTM:recycle()
   decLSTM:recycle()
end
for i=1,1000 do
  train(i,encInSeq,decInSeq,decOutSeq)
  -- train(i,encInSeq1,decInSeq1,decOutSeq1)
end

 

 

Returns a new Tensor which is a narrowed version of the current one: the dimension dim is narrowed from index to index+size-1.

> x = torch.Tensor(5, 6):zero()
> print(x)

0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
[torch.Tensor of dimension 5x6]

> y = x:narrow(1, 2, 3) -- narrow dimension 1 from index 2 to index 2+3-1
> y:fill(1) -- fill with 1
> print(y)

 1  1  1  1  1  1
 1  1  1  1  1  1
 1  1  1  1  1  1
[torch.Tensor of dimension 3x6]

> print(x) -- memory in x has been modified!

 0  0  0  0  0  0
 1  1  1  1  1  1
 1  1  1  1  1  1
 1  1  1  1  1  1
 0  0  0  0  0  0
[torch.Tensor of dimension 5x6]

Class

https://github.com/torch/torch7/blob/master/doc/utility.md

[metatable] torch.class(name, [parentName], [module])

https://github.com/torch/class

Object Classes for Lua

This package provide simple object-oriented capabilities to Lua. Each class is defined with a metatable, which contains methods. Inheritance is achieved by setting metatables over metatables. An efficient type checking is provided.

Typical Example

local class = require 'class'

-- define some dummy A class
local A = class('A')

function A:__init(stuff)
  self.stuff = stuff
end

function A:run()
  print(self.stuff)
end

-- define some dummy B class, inheriting from A
local B = class('B', 'A')

function B:__init(stuff)
  A.__init(self, stuff) -- call the parent init
end

function B:run5()
  for i=1,5 do
    print(self.stuff)
  end
end

-- create some instances of both classes
local a = A('hello world from A')
local b = B('hello world from B')

-- run stuff
a:run()
b:run()
b:run5()

Documentation

First, require the package

local class = require 'class'

Note that class does not clutter the global namespace.

Class metatables are then created with class(name) or equivalently class.new(name).

local A = class('A')
local B = class('B', 'A') -- B inherit from A

You then have to fill-up the returned metatable with methods.

function A:myMethod()
  -- do something
end

——————————————

Creates a new Torch class called name. If parentName is provided, the class will inherit parentName methods. A class is a table which has a particular metatable.

If module is not provided and if name is of the form package.className then the class className will be added to the specified package. In that case, package has to be a valid (and already loaded) package. If name does not contain any ., then the class will be defined in the global environment.

If module is provided table, the class will be defined in this table at keyclassName.

One [or two] (meta)tables are returned. These tables contain all the method provided by the class [and its parent class if it has been provided]. After a call to torch.class() you have to fill-up properly the metatable.

After the class definition is complete, constructing a new class name will be achieved by a call to name(). This call will first call the method lua__init() if it exists, passing all arguments of name().

-- for naming convenience
do
   --- creates a class "Foo"
   local Foo = torch.class('Foo')

   --- the initializer
   function Foo:__init()
      self.contents = 'this is some text'
   end

   --- a method
   function Foo:print()
      print(self.contents)
   end

   --- another one
   function Foo:bip()
      print('bip')
   end

end

--- now create an instance of Foo
foo = Foo()

--- try it out
foo:print()

--- create a class torch.Bar which
--- inherits from Foo
do
   local Bar, parent = torch.class('torch.Bar', 'Foo')

   --- the initializer
   function Bar:__init(stuff)
      --- call the parent initializer on ourself
      parent.__init(self)

      --- do some stuff
      self.stuff = stuff
   end

   --- a new method
   function Bar:boing()
      print('boing!')
   end

   --- override parent's method
   function Bar:print()
      print(self.contents)
      print(self.stuff)
   end
end

--- create a new instance and use it
bar = torch.Bar('ha ha!')
bar:print() -- overrided method
bar:boing() -- child method
bar:bip()   -- parent's method

Narrow

https://github.com/torch/torch7/blob/master/doc/tensor.md
http://jucor.github.io/torch-doc-template/tensor.html#toc_33
http://torch7.readthedocs.io/en/rtd/maths/
https://github.com/torch/torch7/blob/master/doc/storage.md


Attention Model for CNN

Attention Model for RNN

https://github.com/harvardnlp/seq2seq-attn/blob/master/s2sa/models.lua

Imp blog

Attention Mechanism
http://torch.ch/blog/2015/09/21/rmva.html
http://yanran.li/peppypapers/2015/10/07/survey-attention-model-1.html
https://www.quora.com/What-is-exactly-the-attention-mechanism-introduced-to-RNN-recurrent-neural-network-It-would-be-nice-if-you-could-make-it-easy-to-understand

Attention and Memory in Deep Learning and NLP

NNgraph

https://github.com/torch/nngraph

A network with containers

Another net that uses container modules (like ParallelTable) that output a table of outputs.

m = nn.Sequential()
m:add(nn.SplitTable(1))
m:add(nn.ParallelTable():add(nn.Linear(10, 20)):add(nn.Linear(10, 30)))
input = nn.Identity()()
input1, input2 = m(input):split(2)
m3 = nn.JoinTable(1)({input1, input2})

g = nn.gModule({input}, {m3})

indata = torch.rand(2, 10)
gdata = torch.rand(50)
g:forward(indata)
g:backward(indata, gdata)

graph.dot(g.fg, 'Forward Graph')
graph.dot(g.bg, 'Backward Graph')

Tensor

http://jucor.github.io/torch-doc-template/tensor.html

 

 

 

LSTM

 

 

http://kbullaughey.github.io/lstm-play/lstm/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s