Text Similarity

 

 

https://aclweb.org/anthology/S/S16/S16-1170.pdf

http://ttic.uchicago.edu/~kgimpel/papers/he+etal.emnlp15.pdf

http://web.eecs.umich.edu/~honglak/naacl2016-dscnn.pdf

https://cs.uwaterloo.ca/~jimmylin/publications/He_etal_NAACL-HTL2016.pdf

 

http://nlp.cs.berkeley.edu/pubs/FrancisLandau-Durrett-Klein_2016_EntityConvnets_paper.pdf

http://emnlp2014.org/papers/pdf/EMNLP2014181.pdf

http://arxiv.org/pdf/1503.08909v2.pdf

http://arxiv.org/pdf/1504.01561v1.pdf

 

Code

https://github.com/hohoCode/textSimilarityConvNet

 

 

Convolution for NLP —Temporal Convolution

(Convolutional Neural Networks for Sentence Classification)

https://github.com/harvardnlp/sent-conv-torch

https://github.com/FredericGodin/DynamicCNN

https://github.com/harvardnlp/seq2seq-attn

https://github.com/harvardnlp/sent-conv-torch/blob/master/TempConv.ipynb

 

 


 

 

 

 

Fully connected Advantage layer link

https://en.wikipedia.org/wiki/Convolutional_neural_network

http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/

http://stats.stackexchange.com/questions/182102/what-do-the-fully-connected-layers-do-in-cnns

https://www.quora.com/Why-are-fully-connected-layers-used-at-the-very-end-output-side-of-convolutional-NNs-Why-not-earlier


 

CPU to GPU

Ref

http://kbullaughey.github.io/lstm-play/2015/09/21/torch-and-gpu.html

https://github.com/torch/cunn

local input = torch.Tensor(32,2):uniform()
input = input:cuda()
local output = model:forward(input)

… or create them directly as CudaTensors:

local input = torch.CudaTensor(32,2):uniform()
local output = model:forward(input)

Using a GPU in Torch

Using a GPU in Torch is incredibly easy. Getting set up is simply a matter of requiring the cutorchpackage and using the CudaTensor type for your tensors.

cutorch = require 'cutorch'
x = torch.CudaTensor(2,2):uniform(-1,1)

Now all of the operations that involve x will computed on the GPU.

If you have a tensor that is not a CudaTensor but want to make it one, you can use the cuda()function to return a CudaTensor copy of the original:

x = torch.Tensor(2,2):zero()
xCuda = x:cuda()

You can see what type of tensor you have by inspecting it in the console:

th> x
 0  0
 0  0
[torch.DoubleTensor of size 2x2]
th> xCuda
 0  0
 0  0
[torch.CudaTensor of size 2x2]

You can also convert back to a CPU tensor:

th> y = xCuda:double()
th> y
 0  0
 0  0
[torch.DoubleTensor of size 2x2]

Keep in mind that the parameter matrices of the nn.Module objects also need to be configured for GPU use, as these contain internal tensors for storing parameters, and the forward/backward propagation state.

Lucky for us, these also have cuda() methods:

linearMap = nn.Linear(M,M):cuda()

Thoughts on torch

Ref:

http://kbullaughey.github.io/lstm-play/2015/09/21/thoughts-on-torch.html

Basically the only hard thing I need to do when developing with torch is thinking about tensor dimensions. It seems an inordinate amount of my brain cycles are consumed in this way. But I don’t fault torch at this, as I think it’s an unavoidable aspect of working with multi-dimensional tensors.

Lua

Lua also has many great things going for it, and by proxy these are also reasons why torch is great:

  1. Very fast execution time (very little reason to consider C++ or other compiled languages).
  2. Can be easily embedded in other applications.
  3. Nice profiler provided by LuaJIT.
  4. Given it’s interpreted, interactively prototyping code makes it easy to explore how things work.

Unfortunately, there are a number of not so fun aspects of lua:

  1. Feels primitive and very bare-bones compared to other scripting languages.
  2. Rather unhelpful stack traces.
  3. Debugging facilities seem rather lacking.
  4. nil. Because variables don’t need to be defined, spelling mistakes and other minor errors result in nil, which combined with poor stack traces sometimes makes it hard to locate the problem.

 

Show Line Numbers  in Jyputer Notebook

shortcut key: L to toggle  line number

 


HDF

Hierarchical Data Format (HDF) technologies uses to  management of large and complex data collections and ensure long-term access to HDF data.

HDF5

HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.

REf

http://www.hdfgroup.org/

H5py

The h5py package is a Pythonic interface to the HDF5 binary data format.

It lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays. Thousands of datasets can be stored in a single file, categorized and tagged however you want.

 

You can use python-h5py.

sudo apt-get install python-h5py

And then in your Python file try:

import h5py

Ref

http://www.h5py.org/

http://stackoverflow.com/questions/24744969/installing-h5py-on-an-ubuntu-server


ML

http://www.cs.utah.edu/~piyush/teaching/cs5350.html

http://vis.lbl.gov/~romano/mlgroup/papers/linear-dim-red.pdf

https://www.cs.utah.edu/~piyush/teaching/

http://vis.lbl.gov/~romano/mlgroup/papers/

 

 

————————————————————–

Vision and NLP Conf

  • CVPR: IEEE Conference on Computer Vision and Pattern Recognition
  • ICCV: International Conference on Computer Vision
  • ECCV: European Conference on Computer Vision
  • NIPS: Neural Information Processing Systems
  • ICLR: International Conference on Learning Representations
  • ICML: International Conference on Machine Learning
  • EMNLP: Empirical Methods in Natural Language Processing
  • ACL: Association for Computational Linguistics
  • NAACL: The North American Chapter of the Association for Computational Linguistics
  • ACCV: Asian Conference on Computer Vision
  • IJCV: International Journal of Computer Vision
  •  

IP Conf

IEEE transactions on Image Processing
IEEE transactions on Signal Processing
IEEE transactions on Multimedia
ICIP
ICASSP
SIGIR
 ———————————————-
projects

Past CS229 Projects: Example projects from Stanford machine learning class

http://cs224d.stanford.edu/reports_2016.html

http://cs224d.stanford.edu/reports_2015.html

http://cs224d.stanford.edu/project.html

http://cs231n.stanford.edu/reports2016.html

http://cs231n.stanford.edu/project.html

Code For LSTM and CNN

TemporalConvolution Example

—————————————————————

Model :  Input-> CNN-> LSTM-> Softmax

——————————————————–

–(Mini-batching using RNN)

local net = nn.Sequential()

net:add(nn.Sequencer(nn.TemporalConvolution(256,200,1,1)))
net:add(nn.Sequencer(nn.ReLU()))
net:add(nn.Sequencer(nn.TemporalMaxPooling(2,2)))
net:add(nn.Sequencer(nn.TemporalConvolution(200,170,1,1)))
net:add(nn.Sequencer(nn.ReLU()))
net:add(nn.Sequencer(nn.TemporalConvolution(170,150,1,1)))
net:add(nn.Sequencer(nn.ReLU()))
net:add(nn.Sequencer(nn.BatchNormalization(150)))
net:add(nn.Sequencer(nn.Linear(150,120)))
net:add(nn.Sequencer(nn.ReLU()))
net:add(nn.BiSequencer(nn.FastLSTM(120,40),nn.FastLSTM(120,40)))
net:add(nn.Sequencer(nn.BatchNormalization(40*2)))
net:add(nn.Sequencer(nn.Linear(40*2,27)))
net:add(nn.Sequencer(nn.SoftMax()))


Ref paper : Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

http://arxiv.org/pdf/1512.02595v1.pdf

Group

https://groups.google.com/forum/#!topic/torch7/YAQcqminACY



 

This tutorial will focus on giving you working knowledge to implement and test a convolutional neural network with torch. If you have not yet setup your machine, please go back and give this a read before starting.

You can start iTorch by opening a console, navigating to <install dir>/torch/iTorch and typing ‘itorch notebook’. Once this is done, a webpage should pop up, which is served from your own local machine. Go ahead and click the ‘New Notebook’ button.

Because we’re just starting out, we will start with a very simple problem to solve. Suppose you have a signal which needs to be classified as either a square pulse, or a triangular pulse. Each pulse is sampled over time. To make the problem slightly more challenging, lets say the pulse is not always in the same place, and the pulse can have constrained but random height and width. There are several techniques we could use to solve this problem. We could do signal processing such as taking the FFT, or we could code up our own custom filters. But that involves work, and also becomes impossible when faced with larger problems. So what do we do? We can build a convolutional neural network!

Convolutional Networks
Convolutional Layers
The network will start out with a 64×1 vector, which we can effectively call a 1-D vector with each value representing the signal strength at each point in time. Next we apply a convolution of those 64 points with ten kernels, each with 7-elements. These kernel weights will act as filters, or features. We don’t know yet what the values will be, since they will be learned as we train the network. Layers of the network that take an input, and convolve on ore more filters to create an output are called convolutional layers. Example:

Convolution 3×1 kernel, 8×1 input

Input: 2 4 3 6 5 3 7 6
Kernel values: -1 2 -1
Output: 3 -4 4 1 -6 5
Further explanation: (-1*2)+(2*4)+(-1*3) = 3

Pooling layer
After convolutional layers, there is frequently a pooling layer. This layer is used to reduce the problem size, and thus speed up training greatly. Typically, MaxPooling is used, which acts like a king of convolution, except that it has a stride usually equal to the kernel size, and the ‘kernel’ really just takes the maximum value of the input, and outputs that maximum value. This is great for classification problems such as this, because the position of the signal isn’t very important, just whether it is square or triangular. So pooling layers throw away some positioning data, but make the problem smaller and easier to train. Example:

Max pooling layer, size 2, stride 2
Input: 3 5 7 6 3 4
Output: 5 7 4
Further explanation: Max(3,5) = 5, Max(7,6) = 7, Max(3,4) = 4

Activation Function
Neural networks achieve their power by introducing non-linearities into the system. Otherwise, networks just become big linear algebra problems, and there is no point in having many layers. In days past, the sigmoid used to be most common, however, recent breakthroughs have indicated that ReLU is a much better operator for deep neural networks. Basically, it is just ‘y = max(0,x)’. So if x is negative, y is 0, otherwise, y is equal to x. Example:

Input: 4 6 2 -4
Output: 4 6 2 0

 

————————————————————————–

Awesome Example for TemporalConvolution

————————————————————–

First things first, be sure to include the neural network package.

-- First, be sure to require the 'nn' package for the neural network functions
require 'nn';

Next, we’ll need to create some training data. Neural networks require many examples in order to train, so we choose to generate 10000 example signals. This number may seem large, but remember that we have 4 randomized components to each wave; Type, height, width, start index. This translates to 2*6*21*6 = 1512 possible permutations. In real life, problems are much more complex.

-- Next, create the training data. We'll use 10000 samples for now
nExamples = 10000

trainset = {}
trainset.data = torch.Tensor(nExamples,64,1):zero() -- Data will be sized as 5000x64x1
trainset.label = torch.Tensor(nExamples):zero()     -- Use one dimensional tensor for label

--The network trainer expects an index metatable
setmetatable(trainset, 
{__index = function(t, i) 
    return {t.data[i], t.label[i]}  -- The trainer is expecting trainset[123] to be {data[123], label[123]}
    end}
);

--The network trainer expects a size function
function trainset:size() 
    return self.data:size(1) 
end

function GenerateTrainingSet()

    -- Time to prepare the training set with data
    -- At random, have data be either a triangular pulse, or a rectangular pulse
    -- Have randomness as to when the signal starts, ends, and how high it is
    for i=1,nExamples do
        curWaveType = math.random(1,2)      -- 1 for triangular signal, 2 for square pulse
        curWaveHeight = math.random(5,10)   -- how high is signal
        curWaveWidth = math.random(20,40)   -- how wide is signal
        curWaveStart = math.random(5,10)    -- when to start signal
    
        for j=1,curWaveStart-1 do
            trainset.data[i][j][1] = 0
        end
    
        if curWaveType==1 then   -- We are making a triangular wave
            delta = curWaveHeight / (curWaveWidth/2);
            for curIndex=1,curWaveWidth/2 do
                trainset.data[i][curWaveStart-1+curIndex][1] = delta * curIndex
            end
            for curIndex=(curWaveWidth/2)+1, curWaveWidth do
                trainset.data[i][curWaveStart-1+curIndex][1] = delta * (curWaveWidth-curIndex)
            end
            trainset.label[i] = 1
        else
            for j=1,curWaveWidth do
                trainset.data[i][curWaveStart-1+j][1] = curWaveHeight
            end
            trainset.label[i] = 2
        end
    end
end

GenerateTrainingSet()

Next, we will construct our neural network. Starting with 64×1 data going in, we will go two Convolution-MaxPool-ReLU ‘layers’, and end with a two layer fully connected neural network, and end with two outputs. Because this is a classification problem, we’ll use log-probability output. Whichever output is greatest (close to zero) is the selection of the network. The other output should have a negative value.

-- This is where we build the model
model = nn.Sequential()                       -- Create network

-- First convolution, using ten, 7-element kernels
model:add(nn.TemporalConvolution(1, 10, 7))   -- 64x1 goes in, 58x10 goes out
model:add(nn.TemporalMaxPooling(2))           -- 58x10 goes in, 29x10 goes out
model:add(nn.ReLU())                          -- non-linear activation function

-- Second convolution, using 5, 7-element kernels
model:add(nn.TemporalConvolution(10, 5, 7))   -- 29x10 goes in, 23x5 goes out
model:add(nn.TemporalMaxPooling(2))           -- 23x5 goes in, 11x5 goes out
model:add(nn.ReLU())                          -- non-linear activation function

-- After convolutional layers, time to do fully connected network
model:add(nn.View(11*5))                        -- Reshape network into 1D tensor

model:add(nn.Linear(11*5, 30))                  -- Fully connected layer, 55 inputs, 30 outputs
model:add(nn.ReLU())                            -- non-linear activation function

model:add(nn.Linear(30, 2))                     -- Final layer has 2 outputs. One for triangle wave, one for square
model:add(nn.ReLU())                            -- non-linear activation function
model:add(nn.LogSoftMax())                      -- log-probability output, since this is a classification problem

With torch, we can see the dimensions of a tensor by applying a ‘#’ before it. So at any time when constructing the network, you can create a partially complete network, and propagate a blank tensor through it and see what the dimension of the last layer is.

-- When building the network, we can test the shape of the output by sending in a dummy tensor
#model:forward(torch.Tensor(64,1))

Next, we set our criteria to nn.ClassNLLCriterion, which is helpful for classification problems. Next, we create a trainer using the StochasticGradient descent algorithm, and set the learning rate and number of iterations. If the learning rate is too high, the network will not converge. If it is too low, the network will converge too slowly. So it takes practice to get this just right.

criterion = nn.ClassNLLCriterion()
trainer = nn.StochasticGradient(model, criterion)
trainer.learningRate = 0.01
trainer.maxIteration = 200 -- do 200 epochs of training

Finally, we train our model! Go grab a cup of coffee, it may take a while. Later we will focus on accelerating these training sessions with the GPU, but our network is so small right now that it isn’t practical to accelerate.

trainer:train(trainset)

We can see what an example output and label are below.

-- Lets see an example output
model:forward(trainset.data[123])

-- Lets see which label that is
trainset.label[123]

Let’s figure out how many of the examples are predicted correctly.

function TestTrainset()
    correct = 0
    for i=1,nExamples do
        local groundtruth = trainset.label[i]
        local prediction = model:forward(trainset.data[i])
        local confidences, indices = torch.sort(prediction, true)  -- sort in descending order
        if groundtruth == indices[1] then
            correct = correct + 1
        else
            --print("Incorrect! "..tostring(i))
        end
    end
    print(tostring(correct))
end

-- Lets see how many out of the 10000 samples we predict correctly!
TestTrainset()

Hopefully, that number should read 10,000. Next, let’s be sure our network is really trained well. Let us generate new training sets, and test them. Hopefully, everything will be 10,000, but if there are some incorrect examples, go back and train some more. In real life, we can suffer from a phenomenon called over-training where the model is over-fit to our training data, but we will cover this in a later article. Try to train your network until it passes everything you can throw at it.

-- Generate a new set of data, and test it
for i=1,10 do
    GenerateTrainingSet()
    TestTrainset()
end

Great, you’ve done it! Now, lets try to gain some understanding into what’s going on here. We created two convolutional layers, the first having ten 1×7 kernels, and the second convolutional layer having five, 10×7 kernels. The reason I use itorch instead of the command line torch interface is so I can easily inspect graphics. Let’s take a look at the filter in the first convolutional layer. We can see that each row is a filter.

require 'image'
itorch.image(model.modules[1].weight)

Kernel_1

We can also see which neurons activate the most. You can propagate any input through the network with the :forward function, as demonstrated earlier. Then, we can visualize the outputs of the ReLU (or any) layers. For example, here is the output of the first ReLu layer. It is obvious that some filters are activating more than others.

itorch.image(model.modules[3].output)

ReLu_1

Next, lets take a look at the next ReLu layer output. Here we can see that the neurons in the 5th layer are by far the most active for this input. So we know that even if our filters look a little chaotic, neurons in a particular layer do activate and stand out. Finally, these values are sent to the fully connected neural network, which makes sense of what it means when different filters are activated in relation to other filters.

itorch.image(model.modules[6].output)

ReLu_2

Now that we understand how different filters activate with certain inputs, let us introduce noise into system and see how the neural network deals with this.

function IntroduceNoise()
    for i=1,nExamples do
        for j=1,64 do
            trainset.data[i][j] = trainset.data[i][j] + torch.normal(0,.25);
        end
    end
end

-- Generate a new set of data, and test it
for i=1,10 do
    GenerateTrainingSet()
    IntroduceNoise()
    TestTrainset()
end

After training my network around 600 epochs, I was able to achieve 100% perfect signal categorization with the noisy inputs, even though I only trained on the noiseless inputs. Wow! This shows us that the network does indeed work, and is powerful enough to filter out noise which happens in real life data. Next, we will be ready for more interesting challenges!

-- To see the network's structure and variables
model.modules

Thanks to this site:

http://supercomputingblog.com/machinelearning/an-intro-to-convolutional-networks-in-torch/

————————————

Ref:

https://stackoverflow.com/questions/36771635/after-loading-a-trained-model-in-torch-how-to-use-this-loaded-model-to-classify

https://groups.google.com/forum/#!topic/torch7/XL30bTW6mNs

https://groups.google.com/forum/#!topic/torch7/jdZ15JjVLSw

https://groups.google.com/forum/#!topic/torch7/peOLx3tfuSQ

http://staff.ustc.edu.cn/~cheneh/paper_pdf/2014/Yi-Zheng-WAIM2014.pdf

————————————

Convolution Tut: Temporal and Spatial

http://torch.ch/torch3/matos/convolutions.pdf


 

lookupTableLayer = nn.LookupTable(vector:size()[1], d)
for i=1,vector:size()[1] do
  lookupTableLayer.weight[i] = vector[i]
end
mlp=nn.Sequential();
mlp:add(lookupTableLayer)
mlp:add(nn.TemporalConvolution(d,H,K,dw))
mlp:add(nn.Tanh())
mlp:add(nn.Max(1))
mlp:add(nn.Tanh())
mlp:add(nn.Linear(H,d))

Now, to train the network, I loop through every training example and for every example I call gradUpdate() which has this code (this is straight from the examples):

function gradUpdate(mlp, x, indexY, learningRate)
  local pred = mlp:forward(x)
  local gradCriterion = findGrad(pred, indexY)
  mlp:zeroGradParameters()
  mlp:backward(x, gradCriterion)
  mlp:updateParameters(learningRate)
end

https://github.com/ganeshjawahar/torch-teacher/blob/master/stanford/model_nngraph.lua

— encode the question
local question_word_vectors = nn.LookupTableMaskZero(#params.dataset.index2word, params.dim)(question):annotate{name = question_word_lookup}
local question_encoder = nn.BiSequencer(nn.GRU(params.dim, params.hid_size, nil, 0), nn.GRU(params.dim, params.hid_size, nil, 0):sharedClone(), nn.JoinTable(1, 1))
(nn.SplitTable(1, 2)(question_word_vectors)):annotate{name = question_encoder}
local final_q_out = nn.Dropout(params.dropout)(nn.Unsqueeze(3)(nn.SelectTable(1)(question_encoder))) — get the last step output
— encode the passage
local passage_word_vectors = nn.LookupTableMaskZero(#params.dataset.index2word, params.dim)(passage):annotate{name = passage_word_lookup}
local passage_encoder = nn.BiSequencer(nn.GRU(params.dim, params.hid_size, nil, 0), nn.GRU(params.dim, params.hid_size, nil, 0):sharedClone(), nn.JoinTable(1, 1))
(nn.SplitTable(1, 2)(passage_word_vectors)):annotate{name = passage_encoder}
local final_p_out = nn.Dropout(params.dropout)(nn.View(params.bsize, 1, 2 * params.hid_size)
(nn.JoinTable(2)(passage_encoder))) -- combine the forward and backward rnns' output
l = nn.LookupTableMaskZero(3, 1)
print(l:forward(torch.LongTensor{1}))
print(l:forward(torch.LongTensor{0}))


https://github.com/chapternewscu/image-captioning-with-semantic-attention/blob/master/test_attention_weights_criterion.lua
function Seq2Seq:buildModel()
self.encoder = nn.Sequential()
self.encoder:add(nn.LookupTableMaskZero(self.vocabSize, self.hiddenSize))
self.encoderLSTM = nn.FastLSTM(self.hiddenSize, self.hiddenSize):maskZero(1)
self.encoder:add(nn.Sequencer(self.encoderLSTM))
self.encoder:add(nn.Select(1,1))
self.decoder = nn.Sequential()
self.decoder:add(nn.LookupTableMaskZero(self.vocabSize, self.hiddenSize))
self.decoderLSTM = nn.FastLSTM(self.hiddenSize, self.hiddenSize):maskZero(1)
self.decoder:add(nn.Sequencer(self.decoderLSTM))
self.decoder:add(nn.Sequencer(nn.MaskZero(nn.Linear(self.hiddenSize, self.vocabSize),1)))
self.decoder:add(nn.Sequencer(nn.MaskZero(nn.LogSoftMax(),1)))
self.encoder:zeroGradParameters()
self.decoder:zeroGradParameters()
end
Ref: https://github.com/Element-Research/rnn/issues/155
-- Encoder
local enc = nn.Sequential()
enc:add(nn.LookupTableMaskZero(opt.vocabSize, opt.hiddenSize))
enc:add(nn.SplitTable(1, 2)) -- works for both online and mini-batch mode
local encLSTM = nn.LSTM(opt.hiddenSize, opt.hiddenSize):maskZero(1)
enc:add(nn.Sequencer(encLSTM))
enc:add(nn.SelectTable(-1))

-- Decoder
local dec = nn.Sequential()
dec:add(nn.LookupTableMaskZero(opt.vocabSize, opt.hiddenSize))
dec:add(nn.SplitTable(1, 2)) -- works for both online and mini-batch mode
local decLSTM = nn.LSTM(opt.hiddenSize, opt.hiddenSize):maskZero(1)
dec:add(nn.Sequencer(decLSTM))
dec:add(nn.Sequencer(nn.MaskZero(nn.Linear(opt.hiddenSize, opt.vocabSize),1)))
dec:add(nn.Sequencer(nn.MaskZero(nn.LogSoftMax(),1)))
-- dec = nn.MaskZero(dec,1)

Ref: https://groups.google.com/forum/#!topic/torch7/ZUu4KhBqZ_0

 implemented the model using Nicholas Leonard’s rnn package (https://github.com/Element-Research/rnn) as follows:
model = nn.Sequential()
model:add(nn.LookupTableMaskZero(vocabSize, embeddingSize))
model:add(nn.SplitTable(1, 2))
 
lstm = nn.MaskZero(
  nn.Sequencer(
  nn.Sequential()
  :add(nn.LSTM(embeddingSize,nHidden))
  :add(nn.Dropout())
  :add(nn.LSTM(nHidden,nHidden))
  :add(nn.Dropout())
  :add(nn.Linear(nHidden, vocabSize))
  :add(nn.LogSoftMax())
  ), 1)
 
model:add(lstm)
criterion = nn.SequencerCriterion(nn.ClassNLLCriterion()) — not using SequencerCriterion as we only use the last output


Ref: https://github.com/Element-Research/rnn/issues/75

use MaskZeroCriterion

function
newModelBuild(dictionarySize,nbfeatures,embeddingSize,rhoInput,rhoOutput,lktype,logsoftFlag)
local model=nn.Sequential()
local p=nn.ParallelTable() 
p:add(nn.Identity()) --  -> carries the tensor of features
local lkt=nn.LookupTable(dictionarySize, embeddingSize)
local weightmatrix
if lktype == 0 then 
    weightmatrix=torch.Tensor(dictionarySize,embeddingSize)
    for i=1,dictionarySize do
        for j=1,embeddingSize do
            weightmatrix[i][j]=torch.uniform(0,1)
        end
    end
    lkt.weight:copy(weightmatrix)
else 
    lkt.weight:fill(1.0/embeddingSize)
end
p:add(nn.Sequencer(lkt)) -- ->ListofTensor(batchSize X embeddingSize)
model:add(p)
local SliceList=nn.ConcatTable() -- purpose: create a list tensor created by joining   tensors
for i=1, rhoInput do   
    local Slice =nn.Sequential()
    SliceList:add(Slice)
    local cc=nn.ConcatTable()   -- contains the 2 tensors to join
    Slice:add(cc)
    local a=nn.Sequential()
    cc:add(a)
    a:add(nn.SelectTable(2)) -- we select list of tensor(i)
    a:add(nn.SelectTable(i))  -- we select a tensor(i)
    local b=nn.Sequential()
    cc:add(b)
    b:add(nn.SelectTable(1)) -- we select  tensorF
    Slice:add(nn.JoinTable(2)) -- we create a single tensor = tensorF & tensor(i)
end
for i=rhoInput+1,rhoOutput do
    local Slice =nn.Sequential()
    SliceList:add(Slice)
    local cc=nn.ConcatTable()   -- contains the 2 tensors to join
    Slice:add(cc)
    local a=nn.Sequential()
    cc:add(a)
    a:add(nn.SelectTable(2)) -- we select list of tensor(i)
    a:add(nn.MaskZero(nn.SelectTable(i),1))  -- we select a tensor(i) : put at 0***********
    local b=nn.Sequential()
    cc:add(b)
    b:add(nn.SelectTable(1)) -- we select  tensorF
    Slice:add(nn.JoinTable(2)) -- we create a single tensor = tensorF & tensor(i)
end
model:add(SliceList)
model:add(nn.Sequencer(nn.FastLSTM(embeddingSize+nbfeatures, embeddingSize, rhoOutput)))
model:add(nn.Sequencer(nn.Linear(embeddingSize, dictionarySize)))
if logsoftFlag then model:add(nn.Sequencer(nn.LogSoftMax())) end
return model

 

lookuptable

 

REF:

http://stackoverflow.com/questions/29412658/torch-lookuptable-and-gradient-update

———————————————————-
want to perform zero padding before TemporalConvolution (after lookup table) in order to make sure that the input size is not less that the convolution window size. Here is my network:
model:add(nn.LookupTable(DICTIONARY_SIZE, DICTIONARY_VEC_DIMENTION))
model:add(nn.Padding <—— padding should go here
model:add(nn.TemporalConvolution(DICTIONARY_VEC_DIMENTION, K, CONV_WINDOW_SIZE, 1))
this problem was solved by using LookupTableMaskZero from rnn.
ref

————————————————————————–

 

LRCN

https://github.com/garythung/torch-lrcn

 

CNN link

http://nn.readthedocs.io/en/rtd/convolution/#convolutional-layers

http://nn.readthedocs.io/en/rtd/convolution/index.html#spatialconvolution

 

 

lookup table

http://torch5.sourceforge.net/manual/nn/index-2-5-5.html

https://stackoverflow.com/questions/37748421/lstm-on-top-of-cnn

 

local function create_network()
  local x                = nn.Identity()()
  local y                = nn.Identity()()
  local prev_s           = nn.Identity()()
  local i                = {[0] = LookupTable(params.vocab_size,
                                                    params.rnn_size)(x)}
  local next_s           = {}
  local split         = {prev_s:split(2 * params.layers)}
  for layer_idx = 1, params.layers do
    local prev_c         = split[2 * layer_idx - 1]
    local prev_h         = split[2 * layer_idx]
    local dropped        = nn.Dropout(params.dropout)(i[layer_idx - 1])
    local next_c, next_h = lstm(dropped, prev_c, prev_h)
    table.insert(next_s, next_c)
    table.insert(next_s, next_h)
    i[layer_idx] = next_h
  end
  local h2y              = nn.Linear(params.rnn_size, params.vocab_size)
  local dropped          = nn.Dropout(params.dropout)(i[params.layers])
  local pred             = nn.LogSoftMax()(h2y(dropped))
  local err              = nn.ClassNLLCriterion()({pred, y})
  local module           = nn.gModule({x, y, prev_s},
                                      {err, nn.Identity()(next_s)})
  module:getParameters():uniform(-params.init_weight, params.init_weight)
  return transfer_data(module)
end

Misc

fix-firefox-is-already-running-issue-in-linux

  1. First find the process id of firefox using the following command in any directory:
    pidof firefox
  2. Kill firefox process using the following command in any directory:
    kill [firefox pid]

Then start firefox again.

Or you can do the same thing in just one command.As don_crissti said:

kill $(pidof firefox)



Ref:
http://unix.stackexchange.com/questions/78689/fix-firefox-is-already-running-issue-in-linux

Word Embedding

Nice explantion on Word Enbedding and

https://devblogs.nvidia.com/parallelforall/understanding-natural-language-deep-neural-networks-using-torch/

Word embeddings are not unique to neural networks; they are common to all word-level neural language models. Embeddings are stored in a simple lookup table (or hash table), that given a word, returns the embedding (which is an array of numbers). Figure 1 (check in ref link) shows an example.

Word embeddings are usually initialized to random numbers (and learned during the training phase of the neural network), or initialized from previously trained models over large texts like Wikipedia.

Feed-forward Convolutional Neural Networks

Convolutional Neural Networks (ConvNets), which were covered in a previous Parallel Forall post by Evan Shelhamer, have enjoyed wide success in the last few years in several domains including images, video, audio and natural language processing.

When applied to images, ConvNets usually take raw image pixels as input, interleaving convolution layers along with pooling layers with non-linear functions in between, followed by fully connected layers. Similarly, for language processing, ConvNets take the outputs of word embeddings as input, and then apply interleaved convolution and pooling operations, followed by fully connected layers. Figure 2 shows an example ConvNet applied to sentences.

Convolutional Neural Networks—and more generally, feed-forward neural networks—do not traditionally have a notion of time or experience unless you explicitly pass samples from the past as input. After they are trained, given an input, they treat it no differently when shown the input the first time or the 100th time. But to tackle some problems, you need to look at past experiences and give a different answer.

Recurrent Neural Networks (RNN)

Convolutional Neural Networks—and more generally, feed-forward neural networks—do not traditionally have a notion of time or experience unless you explicitly pass samples from the past as input. After they are trained, given an input, they treat it no differently when shown the input the first time or the 100th time. But to tackle some problems, you need to look at past experiences and give a different answer.

If you send sentences word-by-word into a feed-forward network, asking it to predict the next word, it will do so, but without any notion of the current context. The animation in Figure 3 shows why context is important. Clearly, without context, you can produce sentences that make no sense. You can have context in feed-forward networks, but it is much more natural to add a recurrent connection.

A Recurrent neural network has the capability to give itself feedback from past experiences. Apart from all the neurons in the network, it maintains a hidden state that changes as it sees different inputs. This hidden state is analogous to short-term memory. It remembers past experiences and bases its current answer on both the current input as well as past experiences. An illustration is shown in Figure 4(check in ref link ).

Long Short Term Memory (LSTM)

RNNs keep context in their hidden state (which can be seen as memory). However, classical recurrent networks forget context very fast. They take into account very few words from the past while doing prediction. Here is an example of a language modelling problem that requires longer-term memory.

I bought an apple … I am eating the _____

The probability of the word “apple” should be much higher than any other edible like “banana” or “spaghetti”, because the previous sentence mentioned that you bought an “apple”. Furthermore, any edible is a much better fit than non-edibles like “car”, or “cat”.

Long Short Term Memory (LSTM) [6] units try to address the problem of such long-term dependencies. LSTM has multiple gates that act as a differentiable RAM memory. Access to memory cells is guarded by “read”, “write” and “erase” gates. Information stored in memory cells is available to the LSTM for a much longer time than in a classical RNN, which allows the model to make more context-aware predictions. An LSTM unit is shown in Figure 5.

Exactly how LSTM works is unclear, and fully understanding it is a topic of contemporary research. However, it is known that LSTM outperforms conventional RNNs on many tasks.

Torch + cuDNN + cuBLAS: Implementing ConvNets and Recurrent Nets efficiently

Torch is a scientific computing framework with packages for neural networks and optimization (among hundreds of others). It is based on the Lua language, which is similar to javascript and is treated as a wrapper for optimized C/C++ and CUDA code.

At the core of Torch is a powerful tensor library similar to Numpy. The Torch tensor library has both CPU and GPU backends. The neural networks package in torch implements modules, which are different kinds of neuron layers, and containers, which can have several modules within them. Modules are like Lego blocks, and can be plugged together to form complicated neural networks.

Each module implements a function and its derivative. This makes it easy to calculate the derivative of any neuron in the network with respect to the objective function of the network (via the chain rule). The objective function is simply a mathematical formula to calculate how well a model is doing on the given task. Usually, the smaller the objective, the better the model performs.

The following small example of modules shows how to calculate the element-wise Tanh of an input matrix, by creating an nn.Tanh module and passing the input through it. We calculate the derivative with respect to the objective by passing it in the backward direction.

input = torch.randn(100)
m = nn.Tanh()
output = m:forward(input)
InputDerivative = m:backward(input, ObjectiveDerivative)

Implementing the ConvNet shown in Figure 2 is also very simple with Torch. In this example, we put all the modules into a Sequential container that chains the modules one after the other.

nWordsInDictionary = 100000
embeddingSize = 100
sentenceLength = 5
m = nn.Sequential() -- a container that chains modules one after another
m:add(nn.LookupTable(nWordsInDictionary, embeddingSize))
m:add(nn.TemporalConvolution(sentenceLength, 150, embeddingSize))
m:add(nn.Max(1))
m:add(nn.Linear(150, 1024))
m:add(nn.HardTanh())
m:add(nn.Linear())
 
m:cuda() -- transfer the model to GPU

This ConvNet has :forward and :backward functions that allow you to train your network (on CPUs or GPUs). Here we transfer it to the GPU by calling m:cuda().

An extension to the nn package is the nngraph package which lets you build arbitrary acyclic graphs of neural networks. nngraph makes it easier to build complicated modules such as the LSTM memory unit, as the following example code demonstrates.

local function lstm(i, prev_c, prev_h)
  local function new_input_sum()
    local i2h            = nn.Linear(params.rnn_size, params.rnn_size)
    local h2h            = nn.Linear(params.rnn_size, params.rnn_size)
    return nn.CAddTable()({i2h(i), h2h(prev_h)})
  end
  local in_gate          = nn.Sigmoid()(new_input_sum())
  local forget_gate      = nn.Sigmoid()(new_input_sum())
  local in_gate2         = nn.Tanh()(new_input_sum())
  local next_c           = nn.CAddTable()({
    nn.CMulTable()({forget_gate, prev_c}),
    nn.CMulTable()({in_gate,     in_gate2})
  })
  local out_gate         = nn.Sigmoid()(new_input_sum())
  local next_h           = nn.CMulTable()({out_gate, nn.Tanh()(next_c)})
  return next_c, next_h
end

With these few lines of code we can create powerful state-of-the-art neural networks, ready for execution on CPUs or GPUs with good efficiency.

cuBLAS, and more recently cuDNN, have accelerated deep learning research quite significantly, and the recent success of deep learning can be partly attributed to these awesome libraries from NVIDIA. [Learn more about cuDNN here!] cuBLAS is automatically used by Torch for performing BLAS operations such as matrix multiplications, and accelerates neural networks significantly compared to CPUs.

To use NVIDIA cuDNN in Torch, simply replace the prefix nn. with cudnn.. cuDNN accelerates the training of neural networks compared to Torch’s default CUDA backend (sometimes up to 30%) and is often several orders of magnitude faster than using CPUs.

For language modeling, we’ve implemented an RNN-LSTM neural network [9] using Torch. It gives state-of-the-art results on a standard quality metric called perplexity. The full source of this implementation is available here.

We compare the training time of the network on an Intel Core i7 2.6 GHZ vs accelerating it on an NVIDIA GeForce GTX 980 GPU. Table 2 shows the training times and GPU speedups for a small RNN and a larger RNN.

Table 2: Training times of a state-of-the-art recurrent network with LSTM cells on CPU vs GPU.

Conventional Neural Network

Figure 1: Conventional Neural Network

2.1 Lookup Table

The idea of distributed representation for symbolic data is one of the most important reasons why the neural network works. It was proposed by Hinton [11] and has been a research hot spot for more than twenty years [1, 6, 21, 16]. Formally, in the Chinese word segmentation task, we have a character dictionary D of size |D|. Unless otherwise specified, the character dictionary is extracted from the training set and unknown characters are mapped to a special symbol that is not used elsewhere. Each character c∈D is represented as a real-valued vector (character embedding) E⁢m⁢b⁢e⁢d⁢(c)∈ℝd where d is the dimensionality of the vector space. The character embeddings are then stacked into a embedding matrix M∈ℝd×|D|. For a character c∈D that has an associated index k, the corresponding character embedding E⁢m⁢b⁢e⁢d⁢(c)∈ℝd is retrieved by the Lookup Table layer as shown in Figure 1:

E⁢m⁢b⁢e⁢d⁢(c)=M⁢ek (1)

Here ek∈ℝ|D| is a binary vector which is zero in all positions except at k-th index. The Lookup Table layer can be seen as a simple projection layer where the character embedding for each context character is achieved by table lookup operation according to their indices. The embedding matrix M is initialized with small random numbers and trained by back-propagation. We will analyze in more detail about the effect of character embeddings in Section 4.

2.2 Tag Scoring

The most common tagging approach is the window approach. The window approach assumes that the tag of a character largely depends on its neighboring characters. Given an input sentence c[1:n], a window of size w slides over the sentence from character c1 to cn. We set w=5 in all experiments. As shown in Figure 1, at position ci,1≤i≤n, the context characters are fed into the Lookup Table layer. The characters exceeding the sentence boundaries are mapped to one of two special symbols, namely “start” and “end” symbols. The character embeddings extracted by the Lookup Table layer are then concatenated into a single vectora∈ℝH1, where H1=w⋅d is the size of Layer 1. Then a is fed into the next layer which performs linear transformation followed by an element-wise activation function g such as tanh, which is used in our experiments:

h=g⁢(W1⁢a+b1) (2)

where W1∈ℝH2×H1, b1∈ℝH2×1, h∈ℝH2. H2 is a hyper-parameter which is the number of hidden units in Layer 2. Given a set of tags T of size |T|, a similar linear transformation is performed except that no non-linear function is followed:

f(t|c[i-2:i+2])=W2h+b2 (3)

where W2∈ℝ|T|×H2, b2∈ℝ|T|×1. f(t|c[i-2:i+2])∈ℝ|T| is the score vector for each possible tag. In Chinese word segmentation, the most prevalent tag set T is BMES tag set, which uses 4 tags to carry word boundary information. It uses B, M, E and S to denote the Beginning, the Middle, the End of a word and a Single character forming a word respectively. We use this tag set in our method.

2.3 Model Training and Inference

Despite sharing commonalities mentioned above, previous work models the segmentation task differently and therefore uses different training and inference procedure. Mansur et al. [15] modeled Chinese word segmentation as a series of classification task at each position of the sentence in which the tag score is transformed into probability using softmax function:

p(ti|c[i-2:i+2])=exp(f(ti|c[i-2:i+2]))∑t′exp(f(t′|c[i-2:i+2]))

The model is then trained in MLE-style which maximizes the log-likelihood of the tagged data. Obviously, it is a local model which cannot capture the dependency between tags and does not support to infer the tag sequence globally.

To model the tag dependency, previous neural network models [6, 35] introduce a transition score Ai⁢j for jumping from tag i∈T to tag j∈T. For a input sentence c[1:n] with a tag sequence t[1:n], a sentence-level score is then given by the sum of transition and network scores:

s(c[1:n],t[1:n],θ)=∑i=1n(Ati-1⁢ti+fθ(ti|c[i-2:i+2])) (4)

where fθ(ti|c[i-2:i+2]) indicates the score output for tag ti at the i-th character by the network with parameters θ=(M,A,W1,b1,W2,b2). Given the sentence-level score, Zheng et al. [35] proposed a perceptron-style training algorithm inspired by the work of Collins [5]. Compared with Mansur et al. [15], their model is a global one where the training and inference is performed at sentence-level.

Workable as these methods seem, one of the limitations of them is that the tag-tag interaction and the neural network are modeled seperately. The simple tag-tag transition neglects the impact of context characters and thus limits the ability to capture flexible interactions between tags and context characters. Moreover, the simple non-linear transformation in equation (2) is also poor to model the complex interactional effects in Chinese word segmentation.

 Ref:

https://www.aclweb.org/anthology/P/P14/P14-1028.xhtml

 

 

 

 

 

 

require "rnn"
require "cunn"

torch.manualSeed(123)

batch_size= 2
maxLen = 4
wordVec = 5
nWords = 100
mode = 'CPU'

-- create random data with zeros as empty indicator
inp1 = torch.ceil(torch.rand(batch_size, maxLen)*nWords) -- 
labels = torch.ceil(torch.rand(batch_size)*2) -- create labels of 1s and 2s

-- not all sequences have the same lenght, 0 placeholder
for i=1, batch_size do
    n_zeros = torch.random(maxLen-2) 
    inp1[{{i},{1, n_zeros}}] = torch.zeros(n_zeros)
end

-- make the first sequence the same as the second
inp1[{{2},{}}] = inp1[{{1},{}}]:clone()


lstm = nn.Sequential()
lstm:add(nn.LookupTableMaskZero(10000, wordVec, batch_size))  -- convert indices to word vectors
lstm:add(nn.SplitTable(1))  -- convert tensor to list of subtensors
lstm:add(nn.Sequencer(nn.MaskZero(nn.LSTM(wordVec, wordVec), 1))) -- Seq to Seq', 0-Seq to 0-Seq

if mode == 'GPU' then
    lstm:cuda()
    criterion:cuda()
    labels = labels:cuda()
    inp1 = inp1:cuda()
end

out = lstm:forward(inp1)

print('input 1', inp1[1])
print('lstm out 1', out[1])  


print('input 2', inp1[2])  -- shoudl be the same as above
print('lstm out 2', out[2])  --  should be the same as above

REF

http://cseweb.ucsd.edu/~dasgupta/254-deep/stefanos.pdf

Natural language understanding (almost) from scratch

http://resola.ai/dev/

https://iksinc.wordpress.com/tag/continuous-bag-of-words-cbow/

 

Screen Shot 2015-04-10 at 4.16.00 PM

The final layer of the network has one node for each candidate tag, each output is interpreted as the score for the associated tag.

 

 

What is a word vector?

At one level, it’s simply a vector of weights. In a simple 1-of-N (or ‘one-hot’) encoding every element in the vector is associated with a word in the vocabulary. The encoding of a given word is simply the vector in which the corresponding element is set to one, and all other elements are zero.

Suppose our vocabulary has only five words: King, Queen, Man, Woman, and Child. We could encode the word ‘Queen’ as:

Using such an encoding, there’s no meaningful comparison we can make between word vectors other than equality testing.

In word2vec, a distributed representation of a word is used. Take a vector with several hundred dimensions (say 1000). Each word is representated by a distribution of weights across those elements. So instead of a one-to-one mapping between an element in the vector and a word, the representation of a word is spread across all of the elements in the vector, and each element in the vector contributes to the definition of many words.

If I label the dimensions in a hypothetical word vector (there are no such pre-assigned labels in the algorithm of course), it might look a bit like this:

Such a vector comes to represent in some abstract way the ‘meaning’ of a word. And as we’ll see next, simply by examining a large corpus it’s possible to learn word vectors that are able to capture the relationships between words in a surprisingly expressive way. We can also use the vectors as inputs to a neural network.

Reasoning with word vectors

We find that the learned word representations in fact capture meaningful syntactic and semantic regularities in a very simple way. Specifically, the regularities are observed as constant vector offsets between pairs of words sharing a particular relationship. For example, if we denote the vector for word i as xi, and focus on the singular/plural relation, we observe that xapple – xapples ≈ xcar– xcars, xfamily – xfamilies ≈ xcar – xcars, and so on. Perhaps more surprisingly, we find that this is also the case for a variety of semantic relations, as measured by the SemEval 2012 task of measuring relation similarity.

The vectors are very good at answering analogy questions of the form a is to b as cis to ?. For example, man is to woman as uncle is to ? (aunt) using a simple vector offset method based on cosine distance.

For example, here are vector offsets for three word pairs illustrating the gender relation:

Ref

The amazing power of word vectors

Word Embedding Code In torch

 

lookuptable
self.llstm = LSTM
self.rlstm = LSTM

local modules = nn.Parallel()
  :add(nn.LookupTable(self.vocab_size, self.emb_size))
  :add(nn.Collapse(2))
  :add(self.llstm)
  :add(self.my_module)

self.params, self.grad_params = modules:getParameters

ref
http://stackoverflow.com/questions/37126328/how-to-use-nn-lookuptable-in-torch
 

Multiple batches LSTM

ref

https://github.com/Element-Research/rnn/issues/74



require "rnn"
require "cunn"

torch.manualSeed(123)

batch_size= 2
maxLen = 4
wordVec = 5
nWords = 100
mode = 'CPU'

-- create random data with zeros as empty indicator
inp1 = torch.ceil(torch.rand(batch_size, maxLen)*nWords) -- 
labels = torch.ceil(torch.rand(batch_size)*2) -- create labels of 1s and 2s

-- not all sequences have the same lenght, 0 placeholder
for i=1, batch_size do
    n_zeros = torch.random(maxLen-2) 
    inp1[{{i},{1, n_zeros}}] = torch.zeros(n_zeros)
end

-- make the first sequence the same as the second
inp1[{{2},{}}] = inp1[{{1},{}}]:clone()


lstm = nn.Sequential()
lstm:add(nn.LookupTableMaskZero(10000, wordVec, batch_size))  -- convert indices to word vectors
lstm:add(nn.SplitTable(1))  -- convert tensor to list of subtensors
lstm:add(nn.Sequencer(nn.MaskZero(nn.LSTM(wordVec, wordVec), 1))) -- Seq to Seq', 0-Seq to 0-Seq

if mode == 'GPU' then
    lstm:cuda()
    criterion:cuda()
    labels = labels:cuda()
    inp1 = inp1:cuda()
end

out = lstm:forward(inp1)

print('input 1', inp1[1])
print('lstm out 1', out[1])  


print('input 2', inp1[2])  -- shoudl be the same as above
print('lstm out 2', out[2])  --  should be the same as above

 

 


 

sequence-to-sequence networks.

ref https://github.com/Element-Research/rnn/issues/155


--[[ Example of "coupled" separate encoder and decoder networks, e.g.
-- for sequence-to-sequence networks. ]]--
require 'rnn'

version = 1.2 -- refactored numerical gradient test into unit tests. Added training loop

local opt = {}
opt.learningRate = 0.1
opt.hiddenSize = 6
opt.vocabSize = 5
opt.seqLen = 3 -- length of the encoded sequence
opt.niter = 1000

--[[ Forward coupling: Copy encoder cell and output to decoder LSTM ]]--
local function forwardConnect(encLSTM, decLSTM,seqLen)
   decLSTM.userPrevOutput = nn.rnn.recursiveCopy(decLSTM.userPrevOutput, encLSTM.outputs[seqLen])
   decLSTM.userPrevCell = nn.rnn.recursiveCopy(decLSTM.userPrevCell, encLSTM.cells[seqLen])
end

--[[ Backward coupling: Copy decoder gradients to encoder LSTM ]]--
local function backwardConnect(encLSTM, decLSTM)
   encLSTM.userNextGradCell = nn.rnn.recursiveCopy(encLSTM.userNextGradCell, decLSTM.userGradPrevCell)
   encLSTM.gradPrevOutput = nn.rnn.recursiveCopy(encLSTM.gradPrevOutput, decLSTM.userGradPrevOutput)
end

-- Encoder
local enc = nn.Sequential()
enc:add(nn.LookupTableMaskZero(opt.vocabSize, opt.hiddenSize))
enc:add(nn.SplitTable(1, 2)) -- works for both online and mini-batch mode
local encLSTM = nn.LSTM(opt.hiddenSize, opt.hiddenSize):maskZero(1)
enc:add(nn.Sequencer(encLSTM))
enc:add(nn.SelectTable(-1))

-- Decoder
local dec = nn.Sequential()
dec:add(nn.LookupTableMaskZero(opt.vocabSize, opt.hiddenSize))
dec:add(nn.SplitTable(1, 2)) -- works for both online and mini-batch mode
local decLSTM = nn.LSTM(opt.hiddenSize, opt.hiddenSize):maskZero(1)
dec:add(nn.Sequencer(decLSTM))
dec:add(nn.Sequencer(nn.MaskZero(nn.Linear(opt.hiddenSize, opt.vocabSize),1)))
dec:add(nn.Sequencer(nn.MaskZero(nn.LogSoftMax(),1)))
-- dec = nn.MaskZero(dec,1)

local criterion = nn.SequencerCriterion(nn.MaskZeroCriterion(nn.ClassNLLCriterion(),1))

-- Some example data (batchsize = 2)
local encInSeq1 = torch.Tensor({{1,2,3},{3,2,1}}) 
local decInSeq1 = torch.Tensor({{1,2,3,4},{2,4,3,1}})
local decOutSeq1 = torch.Tensor({{2,3,4,1},{4,3,1,2}})
decOutSeq1 = nn.SplitTable(1, 1):forward(decOutSeq1)
local encInSeq = torch.Tensor({{1,1,1,2,3},{0,0,1,2,3},{0,0,3,2,1}}) 
local decInSeq = torch.Tensor({{1,1,1,1,2,3},{1,2,3,4,0,0},{2,4,3,1,0,0}})
local decOutSeq = torch.Tensor({{1,1,1,2,3,2},{2,3,4,1,0,0},{4,3,1,2,0,0}})
decOutSeq = nn.SplitTable(1, 1):forward(decOutSeq)
print(decOutSeq)


print('encoder:')
for i,module in ipairs(enc:listModules()) do
  print(module)
  break
end
print('decoder:')
for i,module in ipairs(dec:listModules()) do
  print(module)
  break
end
local function train(i,encInSeq, decInSeq,decOutSeq)

   -- Forward pass
   local len = encInSeq:size(2)
   -- print(len)
   local encOut = enc:forward(encInSeq)
   forwardConnect(encLSTM, decLSTM,len)
   local decOut = dec:forward(decInSeq)
   -- print("decout:")
   -- for i = 1,#decOut do
     -- print(decOut[i])
   -- end
   local err = criterion:forward(decOut, decOutSeq)
   -- print(err) 
   print(string.format("Iteration %d ; NLL err = %f ", i, err))

   -- Backward pass

   local gradOutput = criterion:backward(decOut, decOutSeq)
   dec:backward(decInSeq, gradOutput)
   backwardConnect(encLSTM, decLSTM)
   local zeroTensor = torch.Tensor(2):zero()
   enc:backward(encInSeq, zeroTensor)

   dec:updateParameters(opt.learningRate)
   enc:updateParameters(opt.learningRate)
   enc:zeroGradParameters()
   dec:zeroGradParameters()
   dec:forget()
   enc:forget()
   encLSTM:recycle()
   decLSTM:recycle()
end
for i=1,1000 do
  train(i,encInSeq,decInSeq,decOutSeq)
  -- train(i,encInSeq1,decInSeq1,decOutSeq1)
end

 

 

Returns a new Tensor which is a narrowed version of the current one: the dimension dim is narrowed from index to index+size-1.

> x = torch.Tensor(5, 6):zero()
> print(x)

0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
[torch.Tensor of dimension 5x6]

> y = x:narrow(1, 2, 3) -- narrow dimension 1 from index 2 to index 2+3-1
> y:fill(1) -- fill with 1
> print(y)

 1  1  1  1  1  1
 1  1  1  1  1  1
 1  1  1  1  1  1
[torch.Tensor of dimension 3x6]

> print(x) -- memory in x has been modified!

 0  0  0  0  0  0
 1  1  1  1  1  1
 1  1  1  1  1  1
 1  1  1  1  1  1
 0  0  0  0  0  0
[torch.Tensor of dimension 5x6]

Class

https://github.com/torch/torch7/blob/master/doc/utility.md

[metatable] torch.class(name, [parentName], [module])

https://github.com/torch/class

Object Classes for Lua

This package provide simple object-oriented capabilities to Lua. Each class is defined with a metatable, which contains methods. Inheritance is achieved by setting metatables over metatables. An efficient type checking is provided.

Typical Example

local class = require 'class'

-- define some dummy A class
local A = class('A')

function A:__init(stuff)
  self.stuff = stuff
end

function A:run()
  print(self.stuff)
end

-- define some dummy B class, inheriting from A
local B = class('B', 'A')

function B:__init(stuff)
  A.__init(self, stuff) -- call the parent init
end

function B:run5()
  for i=1,5 do
    print(self.stuff)
  end
end

-- create some instances of both classes
local a = A('hello world from A')
local b = B('hello world from B')

-- run stuff
a:run()
b:run()
b:run5()

Documentation

First, require the package

local class = require 'class'

Note that class does not clutter the global namespace.

Class metatables are then created with class(name) or equivalently class.new(name).

local A = class('A')
local B = class('B', 'A') -- B inherit from A

You then have to fill-up the returned metatable with methods.

function A:myMethod()
  -- do something
end

——————————————

Creates a new Torch class called name. If parentName is provided, the class will inherit parentName methods. A class is a table which has a particular metatable.

If module is not provided and if name is of the form package.className then the class className will be added to the specified package. In that case, package has to be a valid (and already loaded) package. If name does not contain any ., then the class will be defined in the global environment.

If module is provided table, the class will be defined in this table at keyclassName.

One [or two] (meta)tables are returned. These tables contain all the method provided by the class [and its parent class if it has been provided]. After a call to torch.class() you have to fill-up properly the metatable.

After the class definition is complete, constructing a new class name will be achieved by a call to name(). This call will first call the method lua__init() if it exists, passing all arguments of name().

-- for naming convenience
do
   --- creates a class "Foo"
   local Foo = torch.class('Foo')

   --- the initializer
   function Foo:__init()
      self.contents = 'this is some text'
   end

   --- a method
   function Foo:print()
      print(self.contents)
   end

   --- another one
   function Foo:bip()
      print('bip')
   end

end

--- now create an instance of Foo
foo = Foo()

--- try it out
foo:print()

--- create a class torch.Bar which
--- inherits from Foo
do
   local Bar, parent = torch.class('torch.Bar', 'Foo')

   --- the initializer
   function Bar:__init(stuff)
      --- call the parent initializer on ourself
      parent.__init(self)

      --- do some stuff
      self.stuff = stuff
   end

   --- a new method
   function Bar:boing()
      print('boing!')
   end

   --- override parent's method
   function Bar:print()
      print(self.contents)
      print(self.stuff)
   end
end

--- create a new instance and use it
bar = torch.Bar('ha ha!')
bar:print() -- overrided method
bar:boing() -- child method
bar:bip()   -- parent's method

Narrow

https://github.com/torch/torch7/blob/master/doc/tensor.md
http://jucor.github.io/torch-doc-template/tensor.html#toc_33
http://torch7.readthedocs.io/en/rtd/maths/
https://github.com/torch/torch7/blob/master/doc/storage.md


Attention Model for CNN

Attention Model for RNN

https://github.com/harvardnlp/seq2seq-attn/blob/master/s2sa/models.lua

Imp blog

Attention Mechanism
http://torch.ch/blog/2015/09/21/rmva.html
http://yanran.li/peppypapers/2015/10/07/survey-attention-model-1.html
https://www.quora.com/What-is-exactly-the-attention-mechanism-introduced-to-RNN-recurrent-neural-network-It-would-be-nice-if-you-could-make-it-easy-to-understand

Attention and Memory in Deep Learning and NLP

NNgraph

https://github.com/torch/nngraph

A network with containers

Another net that uses container modules (like ParallelTable) that output a table of outputs.

m = nn.Sequential()
m:add(nn.SplitTable(1))
m:add(nn.ParallelTable():add(nn.Linear(10, 20)):add(nn.Linear(10, 30)))
input = nn.Identity()()
input1, input2 = m(input):split(2)
m3 = nn.JoinTable(1)({input1, input2})

g = nn.gModule({input}, {m3})

indata = torch.rand(2, 10)
gdata = torch.rand(50)
g:forward(indata)
g:backward(indata, gdata)

graph.dot(g.fg, 'Forward Graph')
graph.dot(g.bg, 'Backward Graph')

Tensor

http://jucor.github.io/torch-doc-template/tensor.html

 

 

 

LSTM

 

 

http://kbullaughey.github.io/lstm-play/lstm/

Torch Tips

=================================

 python -m SimpleHTTPServer

 

https://your-ip:8888 — this will work like network

 

 

========================================================
–list of screen
screen ls

–resume the screen
screen -r 18497.new_vision

–detach screen
cntl+A D

=============================================================

how to convert a table to tensor in torch

torch.Tensor(table)

The argument is assumed to be a Lua array of numbers. The constructor returns a new Tensor of the size of the table, containing all the table elements. The table might be multi-dimensional.

Example:

> torch.Tensor({{1,2,3,4}, {5,6,7,8}})
 1  2  3  4
 5  6  7  8
[torch.DoubleTensor of dimension 2x4]

p={0.3148,
0.3574,
0.3829,
0.3967,
0.4062,
0.4180,
0.4208,
0.4267,
0.4312,
0.4329,

}
–torch.Tensor(table)
–p is table and x is tensor. all the operation in torch are in tensor, not on table.
x=torch.Tensor(p)
print(x)
print(‘p’,p)
y=torch.Tensor(q)
q={0.2603,
0.3541,
0.3874,
0.4088,
0.4232,
0.4330,
0.4404,
0.4479,
0.4549,
0.4608,
0.4631,
0.4693,
0.4740,
0.4822,
}
print(y)
x=torch.mul(x,10000) — x should be tensor, if x is table then i will give error
y=torch.mul(y,10000)
print(y)
print(x)

————————-

[res] torch.mul([res,] tensor1, value)

Multiply all elements in the Tensor by the given value.

z = torch.mul(x, 2) will return a new Tensor with the result of x * 2.

torch.mul(z, x, 2) will put the result of x * 2 in z.

x:mul(2) will multiply all elements of x with 2 in-place.

z:mul(x, 2) will put the result of x * 2 in z.

 

ref

https://github.com/torch/torch7/blob/master/doc/maths.md

https://github.com/torch/torch7/blob/master/doc/tensor.md

 


for math operation

https://github.com/torch/torch7/blob/master/doc/maths.md

————————————————–

for adding table

https://github.com/torch/nn/blob/master/doc/table.md#nn.CAddTable

—————————————————–

to plot graph

https://github.com/torch/optim/blob/master/doc/logger.md

–[[ Logger: a simple class to log symbols during training,
and automate plot generation
Example:
logger = optim.Logger(‘somefile.log’) — file to save stuff
for i = 1,N do — log some symbols during
train_error = … — training/testing
test_error = …
logger:add{[‘training error’] = train_error,
[‘test error’] = test_error}
end
logger:style{[‘training error’] = ‘-‘, — define styles for plots
[‘test error’] = ‘-‘}
logger:plot() — and plot
—- OR —
logger = optim.Logger(‘somefile.log’) — file to save stuff
logger:setNames{‘training error’, ‘test error’}
for i = 1,N do — log some symbols during
train_error = … — training/testing
test_error = …
logger:add{train_error, test_error}
end
logger:style{‘-‘, ‘-‘} — define styles for plots
logger:plot() — and plot
———–
logger:setlogscale(true) — enable logscale on Y-axis
logger:plot() — and plot
]]

 

-------------------------------------

JoinTable

module = JoinTable(dimension, nInputDims)

Creates a module that takes a table of Tensors as input and outputs a Tensor by joining them together along dimensiondimension. In the diagram below dimension is set to 1.

+----------+             +-----------+
| {input1, +-------------> output[1] |
|          |           +-----------+-+
|  input2, +-----------> output[2] |
|          |         +-----------+-+
|  input3} +---------> output[3] |
+----------+         +-----------+

The optional parameter nInputDims allows to specify the number of dimensions that this module will receive. This makes it possible to forward both minibatch and non-minibatch Tensors through the same module.

Example 1

x = torch.randn(5, 1)
y = torch.randn(5, 1)
z = torch.randn(2, 1)

print(nn.JoinTable(1):forward{x, y})
print(nn.JoinTable(2):forward{x, y})
print(nn.JoinTable(1):forward{x, z})

gives the output:

 1.3965
 0.5146
-1.5244
-0.9540
 0.4256
 0.1575
 0.4491
 0.6580
 0.1784
-1.7362
[torch.DoubleTensor of dimension 10x1]

 1.3965  0.1575
 0.5146  0.4491
-1.5244  0.6580
-0.9540  0.1784
 0.4256 -1.7362
[torch.DoubleTensor of dimension 5x2]

 1.3965
 0.5146
-1.5244
-0.9540
 0.4256
-1.2660
 1.0869
[torch.Tensor of dimension 7x1]

Example 2

module = nn.JoinTable(2, 2)

x = torch.randn(3, 1)
y = torch.randn(3, 1)

mx = torch.randn(2, 3, 1)
my = torch.randn(2, 3, 1)

print(module:forward{x, y})
print(module:forward{mx, my})

gives the output:

 0.4288  1.2002
-1.4084 -0.7960
-0.2091  0.1852
[torch.DoubleTensor of dimension 3x2]

(1,.,.) =
  0.5561  0.1228
 -0.6792  0.1153
  0.0687  0.2955

(2,.,.) =
  2.5787  1.8185
 -0.9860  0.6756
  0.1989 -0.4327
[torch.DoubleTensor of dimension 2x3x2]

A more complicated example

mlp = nn.Sequential()         -- Create a network that takes a Tensor as input
c = nn.ConcatTable()          -- The same Tensor goes through two different Linear
c:add(nn.Linear(10, 3))       -- Layers in Parallel
c:add(nn.Linear(10, 7))
mlp:add(c)                    -- Outputing a table with 2 elements
p = nn.ParallelTable()        -- These tables go through two more linear layers
p:add(nn.Linear(3, 2))        -- separately.
p:add(nn.Linear(7, 1))
mlp:add(p)
mlp:add(nn.JoinTable(1))      -- Finally, the tables are joined together and output.

pred = mlp:forward(torch.randn(10))
print(pred)

for i = 1, 100 do             -- A few steps of training such a network..
   x = torch.ones(10)
   y = torch.Tensor(3); y:copy(x:narrow(1, 1, 3))
   pred = mlp:forward(x)

   criterion= nn.MSECriterion()
   local err = criterion:forward(pred, y)
   local gradCriterion = criterion:backward(pred, y)
   mlp:zeroGradParameters()
   mlp:backward(x, gradCriterion)
   mlp:updateParameters(0.05)

   print(err)
end

Ref:

https://github.com/torch/nn/blob/master/doc/table.md#nn.JoinTable

Concat

module = nn.Concat(dim)

Concat concatenates the output of one layer of “parallel” modules along the provided dimension dim: they take the same inputs, and their output is concatenated.

mlp=nn.Concat(1);
mlp:add(nn.Linear(5,3))
mlp:add(nn.Linear(5,7))
print(mlp:forward(torch.randn(5)))

which gives the output:

 0.7486
 0.1349
 0.7924
-0.0371
-0.4794
 0.3044
-0.0835
-0.7928
 0.7856
-0.1815
[torch.Tensor of dimension 10]

[res] torch.cat( [res,] {x_1, x_2, …}, [dimension] )

x = torch.cat(x_1, x_2, [dimension]) returns a Tensor x which is the concatenation of Tensors x_1 and x_2along dimension dimension.

If dimension is not specified it is the last dimension.

The other dimensions of x_1 and x_2 have to be equal.

Also supports arrays with arbitrary numbers of Tensors as inputs.

Examples:

> torch.cat(torch.ones(3), torch.zeros(2))
 1
 1
 1
 0
 0
[torch.DoubleTensor of size 5]

> torch.cat(torch.ones(3, 2), torch.zeros(2, 2), 1)
 1  1
 1  1
 1  1
 0  0
 0  0
[torch.DoubleTensor of size 5x2]

> torch.cat(torch.ones(2, 2), torch.zeros(2, 2), 1)
 1  1
 1  1
 0  0
 0  0
[torch.DoubleTensor of size 4x2]

> torch.cat(torch.ones(2, 2), torch.zeros(2, 2), 2)
 1  1  0  0
 1  1  0  0
[torch.DoubleTensor of size 2x4]

> torch.cat(torch.cat(torch.ones(2, 2), torch.zeros(2, 2), 1), torch.rand(3, 2), 1)
 1.0000  1.0000
 1.0000  1.0000
 0.0000  0.0000
 0.0000  0.0000
 0.3227  0.0493
 0.9161  0.1086
 0.2206  0.7449
[torch.DoubleTensor of size 7x2]

> torch.cat({torch.ones(2, 2), torch.zeros(2, 2), torch.rand(3, 2)}, 1)
 1.0000  1.0000
 1.0000  1.0000
 0.0000  0.0000
 0.0000  0.0000
 0.3227  0.0493
 0.9161  0.1086
 0.2206  0.7449
[torch.DoubleTensor of size 7x2]

ConcatTable

module = nn.ConcatTable()

ConcatTable is a container module that applies each member module to the same input Tensor or table.

                  +-----------+
             +----> {member1, |
+-------+    |    |           |
| input +----+---->  member2, |
+-------+    |    |           |
   or        +---->  member3} |
 {input}          +-----------+

Example 1

mlp = nn.ConcatTable()
mlp:add(nn.Linear(5, 2))
mlp:add(nn.Linear(5, 3))

pred = mlp:forward(torch.randn(5))
for i, k in ipairs(pred) do print(i, k) end

which gives the output:

1
-0.4073
 0.0110
[torch.Tensor of dimension 2]

2
 0.0027
-0.0598
-0.1189
[torch.Tensor of dimension 3]

Example 2

mlp = nn.ConcatTable()
mlp:add(nn.Identity())
mlp:add(nn.Identity())

pred = mlp:forward{torch.randn(2), {torch.randn(3)}}
print(pred)

which gives the output (using th):

{
  1 :
    {
      1 : DoubleTensor - size: 2
      2 :
        {
          1 : DoubleTensor - size: 3
        }
    }
  2 :
    {
      1 : DoubleTensor - size: 2
      2 :
        {
          1 : DoubleTensor - size: 3
        }
    }
}

ref:

https://github.com/torch/nn/blob/master/doc/containers.md#nn.Concat

https://github.com/torch/nn/blob/master/doc/table.md#nn.ConcatTable

————————————

Math Library Tutorial

lua-users home
wiki

The math library is documented in section 6.7 of the Reference Manual.[1] Below is a summary of the functions and variables provided. Each is described, with an example, on this page.

math.abs
math.acos
math.asin
math.atan
math.ceil
math.cos
math.deg
math.exp
math.floor
math.fmod
math.huge
math.log
math.max
math.maxinteger
math.min
math.mininteger
math.modf
math.pi
math.rad
math.random
math.randomseed
math.sin
math.sqrt
math.tan
math.tointeger
math.type
math.ult

math.abs

Return the absolute, or non-negative value, of a given value.

> = math.abs(-100)
100
> = math.abs(25.67)
25.67
> = math.abs(0)
0

math.acos , math.asin

Return the inverse cosine and sine in radians of the given value.

> = math.acos(1)
0
> = math.acos(0)
1.5707963267949
> = math.asin(0)
0
> = math.asin(1)
1.5707963267949

Ref:

http://lua-users.org/wiki/MathLibraryTutorial

————————————

Neural Network Package

This package provides an easy and modular way to build and train simple or complex neural networks using Torch:

——————————

Ref:

https://github.com/torch/nn

———————————–

Convolutional layers

A convolution is an integral that expresses the amount of overlap of one function g as it is shifted over another function f. It therefore “blends” one function with another. The neural network package supports convolution, pooling, subsampling and other relevant facilities. These are divided based on the dimensionality of the input and output Tensors:

——————————————————————-

REF:

https://github.com/torch/nn/blob/master/doc/convolution.md#nn.convlayers.dok

https://github.com/torch/nn/blob/master/doc/training.md#nn.traningneuralnet.dok

————————————

Simple layers

Simple Modules are used for various tasks like adapting Tensor methods and providing affine transformations :

  • Parameterized Modules :
    • Linear : a linear transformation ;
    • SparseLinear : a linear transformation with sparse inputs ;
    • Bilinear : a bilinear transformation with sparse inputs ;
    • PartialLinear : a linear transformation with sparse inputs with the option of only computing a subset ;
    • Add : adds a bias term to the incoming data ;
    • Mul : multiply a single scalar factor to the incoming data ;
    • CMul : a component-wise multiplication to the incoming data ;
    • Euclidean : the euclidean distance of the input to k mean centers ;
    • WeightedEuclidean : similar to Euclidean, but additionally learns a diagonal covariance matrix ;
    • Cosine : the cosine similarity of the input to k mean centers ;
  • Modules that adapt basic Tensor methods :
  • Modules that adapt mathematical Tensor methods :
    • AddConstant : adding a constant ;
    • MulConstant : multiplying a constant ;
    • Max : a max operation over a given dimension ;
    • Min : a min operation over a given dimension ;
    • Mean : a mean operation over a given dimension ;
    • Sum : a sum operation over a given dimension ;
    • Exp : an element-wise exp operation ;
    • Log : an element-wise log operation ;
    • Abs : an element-wise abs operation ;
    • Power : an element-wise pow operation ;
    • Square : an element-wise square operation ;
    • Sqrt : an element-wise sqrt operation ;
    • Clamp : an element-wise clamp operation ;
    • Normalize : normalizes the input to have unit L_p norm ;
    • MM : matrix-matrix multiplication (also supports batches of matrices) ;
  • Miscellaneous Modules :
    • BatchNormalization : mean/std normalization over the mini-batch inputs (with an optional affine transform) ;
    • Identity : forward input as-is to output (useful with ParallelTable) ;
    • Dropout : masks parts of the input using binary samples from a bernoulli distribution ;
    • SpatialDropout : same as Dropout but for spatial inputs where adjacent pixels are strongly correlated ;
    • VolumetricDropout : same as Dropout but for volumetric inputs where adjacent voxels are strongly correlated ;
    • Padding : adds padding to a dimension ;
    • L1Penalty : adds an L1 penalty to an input (for sparsity) ;
    • GradientReversal : reverses the gradient (to maximize an objective function) ;
    • GPU : decorates a module so that it can be executed on a specific GPU device.

 

Table Layers

This set of modules allows the manipulation of tables through the layers of a neural network. This allows one to build very rich architectures:

  • table Container Modules encapsulate sub-Modules:
    • ConcatTable: applies each member module to the same input Tensor and outputs a table;
    • ParallelTable: applies the i-th member module to the i-th input and outputs a table;
  • Table Conversion Modules convert between tables and Tensors or tables:
  • Pair Modules compute a measure like distance or similarity from a pair (table) of input Tensors:
  • CMath Modules perform element-wise operations on a table of Tensors:
  • Table of Criteria:

 

CmdLine

This class provides a parameter parsing framework which is very useful when one needs to run several experiments that rely on different parameter settings that are passed in the command line. This class will also override the default print function to direct all the output to a log file as well as screen at the same time.

A sample lua file is given below that makes use of CmdLine class.

cmd = torch.CmdLine()
cmd:text()
cmd:text()
cmd:text('Training a simple network')
cmd:text()
cmd:text('Options')
cmd:option('-seed',123,'initial random seed')
cmd:option('-booloption',false,'boolean option')
cmd:option('-stroption','mystring','string option')
cmd:text()

-- parse input params
params = cmd:parse(arg)

params.rundir = cmd:string('experiment', params, {dir=true})
paths.mkdir(params.rundir)

-- create log file

Torch Packages

  • Tensor Library
    • Tensor defines the all powerful tensor object that provides multi-dimensional numerical arrays with type templating.
    • Mathematical operations that are defined for the tensor object types.
    • Storage defines a simple storage interface that controls the underlying storage for any tensor object.
  • File I/O Interface Library
  • Useful Utilities
    • Timer provides functionality for measuring time.
    • Tester is a generic tester framework.
    • CmdLine is a command line argument parsing utility.
    • Random defines a random number generator package with various distributions.
    • Finally useful utility functions are provided for easy handling of torch tensor types and class inheritance.
    • Ref:

REF

https://github.com/torch/nn/blob/master/doc/simple.md#nn.Select

https://github.com/torch/nn/blob/master/doc/table.md

Blog : Deeplearning In torch

http://rnduja.github.io/2015/10/07/deep_learning_with_torch_step_4_nngraph/

https://github.com/torch/torch7/blob/master/doc/cmdline.md

https://github.com/torch/torch7

 

Deep Learning with Torch

REF

http://learning.cs.toronto.edu/wp-content/uploads/2015/02/torch_tutorial.pdf

http://hunch.net/~nyoml/torch7.pdf

 

http://atamahjoubfar.github.io/Torch_for_Matlab_users.pdf

http://ml.informatik.uni-freiburg.de/_media/teaching/ws1415/presentation_dl_lect3.pdf

A Torch autoencoder example

Extracting features from MNIST digits

http://rnduja.github.io/2015/11/06/torch-autoencoder/

http://rnduja.github.io/2015/10/13/torch-mnist/

Introduction to nngraph

https://github.com/torch/nngraph

http://kbullaughey.github.io/lstm-play/2015/09/18/introduction-to-nngraph.html

 

LSTM and Fast LSTM

http://www.humphreysheil.com/blog/getting-to-grips-with-lstm-part-one

http://christopher5106.github.io/deep/learning/2016/07/14/element-research-torch-rnn-tutorial.html

 

Torch Slide

http://hunch.net/~nyoml/torch7.pdf

https://moodle.technion.ac.il/mod/forum/discuss.php?d=293691&lang=en

https://github.com/torch/torch7/wiki/Cheatsheet

 


Module

Module is an abstract class which defines fundamental methods necessary for a training a neural network. Modules areserializable.

Modules contain two states variables: output and gradInput.

[output] forward(input)

Takes an input object, and computes the corresponding output of the module. In general input and output areTensors. However, some special sub-classes like table layers might expect something else. Please, refer to each module specification for further information.

After a forward(), the output state variable should have been updated to the new value.

It is not advised to override this function. Instead, one should implement updateOutput(input) function. The forward module in the abstract parent class Module will call updateOutput(input).

[gradInput] backward(input, gradOutput)

Performs a backpropagation step through the module, with respect to the given input. In general this method makes the assumption forward(input) has been called before, with the same input. This is necessary for optimization reasons. If you do not respect this rule, backward() will compute incorrect gradients.

In general input and gradOutput and gradInput are Tensors. However, some special sub-classes like table layers might expect something else. Please, refer to each module specification for further information.

A backpropagation step consist in computing two kind of gradients at input given gradOutput (gradients with respect to the output of the module). This function simply performs this task using two function calls:

It is not advised to override this function call in custom classes. It is better to override updateGradInput(input, gradOutput)and accGradParameters(input, gradOutput,scale) functions.

updateOutput(input)

Computes the output using the current parameter set of the class and input. This function returns the result which is stored in the output field.

updateGradInput(input, gradOutput)

Computing the gradient of the module with respect to its own input. This is returned in gradInput. Also, the gradInput state variable is updated accordingly.

accGradParameters(input, gradOutput, scale)

Computing the gradient of the module with respect to its own parameters. Many modules do not perform this step as they do not have any parameters. The state variable name for the parameters is module dependent. The module is expected toaccumulate the gradients with respect to the parameters in some variable.

scale is a scale factor that is multiplied with the gradParameters before being accumulated.

Zeroing this accumulation is achieved with zeroGradParameters() and updating the parameters according to this accumulation is done with updateParameters().

zeroGradParameters()

If the module has parameters, this will zero the accumulation of the gradients with respect to these parameters, accumulated through accGradParameters(input, gradOutput,scale) calls. Otherwise, it does nothing.

updateParameters(learningRate)

If the module has parameters, this will update these parameters, according to the accumulation of the gradients with respect to these parameters, accumulated through backward() calls.

The update is basically:

parameters = parameters - learningRate * gradients_wrt_parameters

If the module does not have parameters, it does nothing.

accUpdateGradParameters(input, gradOutput, learningRate)

This is a convenience module that performs two functions at once. Calculates and accumulates the gradients with respect to the weights after multiplying with negative of the learning rate learningRate. Performing these two operations at once is more performance efficient and it might be advantageous in certain situations.

Keep in mind that, this function uses a simple trick to achieve its goal and it might not be valid for a custom module.

Also note that compared to accGradParameters(), the gradients are not retained for future use.

function Module:accUpdateGradParameters(input, gradOutput, lr)
   local gradWeight = self.gradWeight
   local gradBias = self.gradBias
   self.gradWeight = self.weight
   self.gradBias = self.bias
   self:accGradParameters(input, gradOutput, -lr)
   self.gradWeight = gradWeight
   self.gradBias = gradBias
end

As it can be seen, the gradients are accumulated directly into weights. This assumption may not be true for a module that computes a nonlinear operation.

share(mlp,s1,s2,…,sn)

This function modifies the parameters of the module named s1,..sn (if they exist) so that they are shared with (pointers to) the parameters with the same names in the given module mlp.

The parameters have to be Tensors. This function is typically used if you want to have modules that share the same weights or biases.

Note that this function if called on a Container module will share the same parameters for all the contained modules as well.

Example:

-- make an mlp
mlp1=nn.Sequential();
mlp1:add(nn.Linear(100,10));

-- make a second mlp
mlp2=nn.Sequential();
mlp2:add(nn.Linear(100,10));

-- the second mlp shares the bias of the first
mlp2:share(mlp1,'bias');

-- we change the bias of the first
mlp1:get(1).bias[1]=99;

-- and see that the second one's bias has also changed..
print(mlp2:get(1).bias[1])

clone(mlp,…)

Creates a deep copy of (i.e. not just a pointer to) the module, including the current state of its parameters (e.g. weight, biases etc., if any).

If arguments are provided to the clone(...) function it also calls share(…) with those arguments on the cloned module after creating it, hence making a deep copy of this module with some shared parameters.

Example:

-- make an mlp
mlp1=nn.Sequential();
mlp1:add(nn.Linear(100,10));

-- make a copy that shares the weights and biases
mlp2=mlp1:clone('weight','bias');

-- we change the bias of the first mlp
mlp1:get(1).bias[1]=99;

-- and see that the second one's bias has also changed..
print(mlp2:get(1).bias[1])

type(type[, tensorCache])

This function converts all the parameters of a module to the given type. The type can be one of the types defined fortorch.Tensor.

If tensors (or their storages) are shared between multiple modules in a network, this sharing will be preserved after type is called.

To preserve sharing between multiple modules and/or tensors, use nn.utils.recursiveType:

-- make an mlp
mlp1=nn.Sequential();
mlp1:add(nn.Linear(100,10));

-- make a second mlp
mlp2=nn.Sequential();
mlp2:add(nn.Linear(100,10));

-- the second mlp shares the bias of the first
mlp2:share(mlp1,'bias');

-- mlp1 and mlp2 will be converted to float, and will share bias
-- note: tensors can be provided as inputs as well as modules
nn.utils.recursiveType({mlp1, mlp2}, 'torch.FloatTensor')

float([tensorCache])

Convenience method for calling module:type(‘torch.FloatTensor'[, tensorCache])

double([tensorCache])

Convenience method for calling module:type(‘torch.DoubleTensor'[, tensorCache])

cuda([tensorCache])

Convenience method for calling module:type(‘torch.CudaTensor'[, tensorCache])

State Variables

These state variables are useful objects if one wants to check the guts of a Module. The object pointer is never supposed to change. However, its contents (including its size if it is a Tensor) are supposed to change.

In general state variables are Tensors. However, some special sub-classes like table layers contain something else. Please, refer to each module specification for further information.

output

This contains the output of the module, computed with the last call of forward(input).

gradInput

This contains the gradients with respect to the inputs of the module, computed with the last call of updateGradInput(input, gradOutput).

Parameters and gradients w.r.t parameters

Some modules contain parameters (the ones that we actually want to train!). The name of these parameters, and gradients w.r.t these parameters are module dependent.

[{weights}, {gradWeights}] parameters()

This function should returns two tables. One for the learnable parameters {weights} and another for the gradients of the energy wrt to the learnable parameters {gradWeights}.

Custom modules should override this function if they use learnable parameters that are stored in tensors.

[flatParameters, flatGradParameters] getParameters()

This function returns two tensors. One for the flattened learnable parameters flatParameters and another for the gradients of the energy wrt to the learnable parameters flatGradParameters.

Custom modules should not override this function. They should instead override parameters(…) which is, in turn, called by the present function.

This function will go over all the weights and gradWeights and make them view into a single tensor (one for weights and one for gradWeights). Since the storage of every weight and gradWeight is changed, this function should be called only once on a given network.

training()

This sets the mode of the Module (or sub-modules) to train=true. This is useful for modules like Dropout orBatchNormalization that have a different behaviour during training vs evaluation.

evaluate()

This sets the mode of the Module (or sub-modules) to train=false. This is useful for modules like Dropout orBatchNormalization that have a different behaviour during training vs evaluation.

findModules(typename)

Find all instances of modules in the network of a certain typename. It returns a flattened list of the matching nodes, as well as a flattened list of the container modules for each matching node.

Modules that do not have a parent container (ie, a top level nn.Sequential for instance) will return their self as the container.

This function is very helpful for navigating complicated nested networks. For example, a didactic example might be; if you wanted to print the output size of all nn.SpatialConvolution instances:

-- Construct a multi-resolution convolution network (with 2 resolutions):
model = nn.ParallelTable()
conv_bank1 = nn.Sequential()
conv_bank1:add(nn.SpatialConvolution(3,16,5,5))
conv_bank1:add(nn.Threshold())
model:add(conv_bank1)
conv_bank2 = nn.Sequential()
conv_bank2:add(nn.SpatialConvolution(3,16,5,5))
conv_bank2:add(nn.Threshold())
model:add(conv_bank2)
-- FPROP a multi-resolution sample
input = {torch.rand(3,128,128), torch.rand(3,64,64)}
model:forward(input)
-- Print the size of the Threshold outputs
conv_nodes = model:findModules('nn.SpatialConvolution')
for i = 1, #conv_nodes do
  print(conv_nodes[i].output:size())
end

Another use might be to replace all nodes of a certain typename with another. For instance, if we wanted to replace allnn.Threshold with nn.Tanh in the model above:

threshold_nodes, container_nodes = model:findModules('nn.Threshold')
for i = 1, #threshold_nodes do
  -- Search the container for the current threshold node
  for j = 1, #(container_nodes[i].modules) do
    if container_nodes[i].modules[j] == threshold_nodes[i] then
      -- Replace with a new instance
      container_nodes[i].modules[j] = nn.Tanh()
    end
  end
end

listModules()

List all Modules instances in a network. Returns a flattened list of modules, including container modules (which will be listed first), self, and any other component modules.

For example :

mlp = nn.Sequential()
mlp:add(nn.Linear(10,20))
mlp:add(nn.Tanh())
mlp2 = nn.Parallel()
mlp2:add(mlp)
mlp2:add(nn.ReLU())
for i,module in ipairs(mlp2:listModules()) do
   print(module)
end

Which will result in the following output :

nn.Parallel {
  input
    |`-> (1): nn.Sequential {
    |      [input -> (1) -> (2) -> output]
    |      (1): nn.Linear(10 -> 20)
    |      (2): nn.Tanh
    |    }
    |`-> (2): nn.ReLU
     ... -> output
}
nn.Sequential {
  [input -> (1) -> (2) -> output]
  (1): nn.Linear(10 -> 20)
  (2): nn.Tanh
}
nn.Linear(10 -> 20)
nn.Tanh
nn.ReLU

clearState()

Clears intermediate module states as output, gradInput and others. Useful when serializing networks and running low on memory. Internally calls set() on tensors so it does not break buffer sharing.

apply(function)

Calls provided function on itself and all child modules. This function takes module to operate on as a first argument:

model:apply(function(module)
   module.train = true
end)

In the example above train will be set to to true in all modules of model. This is how training() and evaluate()functions implemented.

replace(function)

Similar to apply takes a function which applied to all modules of a model, but uses return value to replace the module. Can be used to replace all modules of one type to another or remove certain modules.

For example, can be used to remove nn.Dropout layers by replacing them with nn.Identity:

model:replace(function(module)
   if torch.typename(module) == 'nn.Dropout' then
      return nn.Identity()
   else
      return module
   end
end)






_______________________________________________________________________________

narrow, select and copy two type with example

https://github.com/torch/torch7/blob/master/doc/tensor.md#self-narrowdim-index-size
https://github.com/torch/nn/blob/master/doc/simple.md#nn.Cosine


Vision and language


http://visionandlanguage.net



http://handong1587.github.io/deep_learning/2015/10/09/nlp.html/

Lemma, Theorem, Axi0m, Statements

A statement is a sentence which has objective and logical meaning.

A proposition is a statement which is offered up for investigation as to its truth or falsehood.

The term axiom is used throughout the whole of mathematics to mean a statement which is accepted as true for that particular branch.

Different fields of mathematics usually have different sets of statements which are considered as being axiomatic.

The term theorem is used throughout the whole of mathematics to mean a statement which has been proved to be true from whichever axioms relevant to that particular branch.

So statements which are taken as axioms in one branch of mathematics may be theorems, or irrelevant, in others.

A definition lays down the meaning of a concept.
It is a statement which tells the reader what something is.

A lemma is a statement which is proven during the course of reaching the proof of a theorem.


Logically there is no qualitative difference between a lemma and a theorem. They are both statements whose value is either true or false. However, a lemma is seen more as a stepping-stone than a theorem in itself (and frequently takes a lot more work to prove than the theorem to which it leads).
Some lemmas are famous enough to be named after the mathematician who proved them (for example: Abel’s Lemma and Urysohn’s Lemma), but they are still categorised as second-class citizens in the aristocracy of mathematics.

A corollary is a proof which is a direct result, or a direct application, of another proof.
It can be considered as being a proof for free on the back of a proof which has been paid for with blood, sweat and tears.
The word is ultimately derived from the Latin corolla, meaning small garland, or the money paid for it. Hence has the sense something extra, lagniappe orfreebie

————————————————————

Difference between axioms, theorems, postulates, corollaries, and hypotheses

Based on logic, an axiom or postulate is a statement that is considered to be self-evident. Both axioms and postulates are assumed to be true without any proof or demonstration. Basically, something that is obvious or declared to be true and accepted but have no proof for that, is called an axiom or a postulate. Axioms and postulate serve as a basis for deducing other truths.

The ancient Greeks recognized the difference between these two concepts. Axioms are self-evident assumptions, which are common to all branches of science, while postulates are related to the particular science.

Axioms

Aristotle by himself used the term “axiom”, which comes from the Greek “axioma”, which means “to deem worth”, but also “to require”. Aristotle had some other names for axioms. He used to call them as “the common things” or “common opinions”. In Mathematics, Axioms can be categorized as “Logical axioms” and “Non-logical axioms”. Logical axioms are propositions or statements, which are considered as universally true. Non-logical axioms sometimes called postulates, define properties for the domain of specific mathematical theory, or logical statements, which are used in deduction to build mathematical theories. “Things which are equal to the same thing, are equal to one another” is an example for a well-known axiom laid down by Euclid.

Postulates

The term “postulate” is from the Latin “postular”, a verb which means “to demand”. The master demanded his pupils that they argue to certain statements upon which he could build. Unlike axioms, postulates aim to capture what is special about a particular structure. “It is possible to draw a straight line from any point to any other point”, “It is possible to produce a finite straight continuously in a straight line”, and “It is possible to describe a circle with any center and any radius” are few examples for postulates illustrated by Euclid.

What is the difference between Axioms and Postulates?

• An axiom generally is true for any field in science, while a postulate can be specific on a particular field.

• It is impossible to prove from other axioms, while postulates are provable to axioms.


In Geometry, “Axiom” and “Postulate” are essentially interchangeable. In antiquity, they referred to propositions that were “obviously true” and only had to be stated, and not proven. In modern mathematics there is no longer an assumption that axioms are “obviously true”. Axioms are merely ‘background’ assumptions we make. The best analogy I know is that axioms are the “rules of the game”. In Euclid’s Geometry, the main axioms/postulates are:

  1. Given any two distinct points, there is a line that contains them.
  2. Any line segment can be extended to an infinite line.
  3. Given a point and a radius, there is a circle with center in that point and that radius.
  4. All right angles are equal to one another.
  5. If a straight line falling on two straight lines makes the interior angles on the same side less than two right angles, the two straight lines, if produced indefinitely, meet on that side on which are the angles less than the two right angles. (The parallel postulate).

A theorem is a logical consequence of the axioms. In Geometry, the “propositions” are all theorems: they are derived using the axioms and the valid rules. A “Corollary” is a theorem that is usually considered an “easy consequence” of another theorem. What is or is not a corollary is entirely subjective. Sometimes what an author thinks is a ‘corollary’ is deemed more important than the corresponding theorem. (The same goes for “Lemma“s, which are theorems that are considered auxiliary to proving some other, more important in the view of the author, theorem).

A “hypothesis” is an assumption made. For example, “If x

is an even integer, then x2 is an even integer” I am not asserting that x2 is even or odd; I am asserting that if something happens (namely, if x happens to be an even integer) then something else will also happen. Here, “x is an even integer” is the hypothesis being made to prove it.


  1. Since it is not possible to define everything, as it leads to a never ending infinite loop of circular definitions, mathematicians get out of this problem by imposing “undefined terms”. Words we never define. In most mathematics that two undefined terms are set and element of.
  2. We would like to be able prove various things concerning sets. But how can we do so if we never defined what a set is? So what mathematicians do next is impose a list of axioms. An axiom is some property of your undefined object. So even though you never define your undefined terms you have rules about them. The rules that govern them are the axioms. One does not prove an axiom, in fact one can choose it to be anything he wishes (of course, if it is done mindlessly it will lead to something trivial).
  3. Now that we have our axioms and undefined terms we can form some main definitions for what we want to work with.
  4. After we defined some stuff we can write down some basic proofs. Usually known as propositions. Propositions are those mathematical facts that are generally straightforward to prove and generally follow easily form the definitions.
  5. Deep propositions that are an overview of all your currently collected facts are usually called Theorems. A good litmus test, to know the difference between a Proposition and Theorem, as somebody once remarked here, is that if you are proud of a proof you call it a Theorem, otherwise you call it a Proposition. Think of a theorem as the end goals we would like to get, deep connections that are also very beautiful results.
  6. Sometimes in proving a Proposition or a Theorem we need some technical facts. Those are called Lemmas. Lemmas are usually not useful by themselves. They are only used to prove a Proposition/Theorem, and then we forget about them.
  7. The net collection of definitions, propositions, theorems, form a mathematical theory.

Technically Axioms are self-evident or self-proving, while postulates are simply taken as given. However really only Euclid and really high end theorists and some poly-maths make such a distinction. See http://www.friesian.com/space.htm

Theorems are then derived from the “first principles” i.e. the axioms and postulates.

——————————————————–

Lemma is generally used to describe a “helper” fact that is used in the proof of a more significant result.

Significant results are frequently called theorems.

Short, easy results of theorems are called corollaries.

But the words aren’t exactly that set in stone.


A lot of authors like to use lemma to mean “small theorem.” Often a group of lemmas are used to prove a larger result, a “theorem.”

A corollary is something that follows trivially from any one of a theorem, lemma, or other corollary.

However, when it boils down to it, all of these things are equivalent as they denote the truth of a statement.

—————————————————————————

Theorem — a mathematical statement that is proved using rigorous mathematical reasoning. In a mathematical paper, the term theorem is often reserved for the most important results.

Lemma — a minor result whose sole purpose is to help in proving a theorem. It is a stepping stone on the path to proving a theorem. Very occasionally lemmas can take on a life of their own (Zorn’s lemma, Urysohn’s lemma, Burnside’s lemma, Sperner’s lemma).

Corollary — a result in which the (usually short) proof relies heavily on a given theorem (we often say that “this is a corollary of Theorem A”).


An axiom is any assumption/statement, which cannot be deduced or proven from existing material on that topic. The axiom of parallelism in Euclidean geometry is one such example.

Axioms are used as building blocks or foundations to prove a certain statement, which is called a “theorem”. Again taking an example from the Euclidean geometry, let us consider the Pythagoras’ theorem. While providing supportive statements to the arguments we make to prove this theorem, we make use of the “axioms”.

Corollary to a theorem is a slight modification of the statement of the theorem, which can easily be deduced from the theorem itself.

A hypothesis is a statement, the correctness of which one wishes to test. Generally, in statistics, a statement is made about the data initially and is tested at a significance level. e.g. The distribution under observation is a normal distribution with μ=μo

is a hypothesis. I still have a bit of confusion with the exact meaning conveyed by the word “postulate”.


As far as geometry is concerned, “Axiom” and “Postulate” are essentially interchangeable. Thus, a postulate refers to as a proposition which is obviously true and hence need not be proven.

However,in modern mathematics there is no longer an assumption that axioms are “obviously true”. They can be looked at as the rules for a game. You cannot go against them, while playing a game, otherwise you will be disqualified.

What is a theory then ? Is it a set of theorems/results derived based on the postulates?


You have all the points covered pretty well, but I’ll just just point out that the usage of hypothesis in mathematics differs slightly from that in the sciences. A mathematical hypothesis isn’t treated as a statement to be proved but as a starting assumption that is made at the beginning of a proof. As a rule of thumb, anything that follows the word ‘if’ or ‘given’ in a theorem constitutes the hypothesis.

However, there is some overlap with its meaning in science. When you prove something by contradiction you make a starting assumption i.e. posit a hypothesis that the negation of the statement to be proved is true and then disprove it.


A scientific theory is a well-substantiated explanation of some aspect of the natural world that can incorporate facts, laws, inferences, and tested hypotheses. A scientific theory is not as clear cut as a scientific fact or law, in that facts and laws must be repeatedly confirmed through observation and experimentation and also be widely accepted to be true by the scientific community. Often it has happened that a theory had to be discarded as it could not explain the observed phenomena. Further a new theory came up, which explained all new phenomena as well as the older ones.

It is worth mentioning here that few of the results have a specific term attached with them for historic reasons. e.g. Riemann Hypothesis, Collatz conjecture. These do not always agree with the the usual usage of the words.

 


Words like “fact,” “theory,” and “law,” get thrown around a lot. When it comes to science, however, they mean something very specific; and knowing the difference between them can help you better understand the world of science as a whole.

In this fantastic video from the It’s Okay To Be Smart YouTube channel, host Joe Hanson clears up some of the confusion surrounding four very important scientific terms: fact, hypothesis, theory, and law. Knowing the difference between these words is the key to understanding news, studies, and any other information that comes from the scientific community. Here are the main takeaways:

  • Fact: Observations about the world around us. Example: “It’s bright outside.”
  • Hypothesis: A proposed explanation for a phenomenon made as a starting point for further investigation. Example: “It’s bright outside because the sun is probably out.”
  • Theory: A well-substantiated explanation acquired through the scientific method and repeatedly tested and confirmed through observation and experimentation. Example: “When the sun is out, it tends to make it bright outside.”
  • Law: A statement based on repeated experimental observations that describes some phenomenon of nature. Proof that something happens and how it happens, but not why it happens. Example: Newton’s Law of Universal Gravitation.
  • Essentially, this is how all science works. You probably knew some of this, or remember bits and pieces of it from grade school, but this video does a great job of explaining the entire process. When you know how something actually works, it makes it a lot easier to understand and scrutinize.

 

——————————————————————-

Ref:

https://www.quora.com/What-are-the-differences-between-theorems-definitions-axioms-lemmas-corollaries-propositions-and-statements

http://functionspace.com/topic/3465/Axioms–Postulates–Theorems–Corollaries–Hypotheses–Theories

http://math.stackexchange.com/questions/7717/difference-between-axioms-theorems-postulates-corollaries-and-hypotheses

http://math.stackexchange.com/questions/463362/difference-between-theorem-lemma-and-corollary

http://functionspace.com/topic/3465/Axioms–Postulates–Theorems–Corollaries–Hypotheses–Theories

http://lifehacker.com/the-difference-between-a-fact-hypothesis-theory-and-1732904200

————————————————————————————–

About CNN

 

http://www.cc.gatech.edu/~hays/compvision/results/proj6/yyeh32/index_files/image005.png
                                                                                                                                                REF[Srinivas et. al.]
Reference

Case Study of Convolutional Neural Network

 

Ref:

https://www.google.co.in/imgres?imgurl=http%3A%2F%2Fimage.slidesharecdn.com%2Flecture29-convolutionalneuralnetworks-visionspring2015-150504114140-conversion-gate02%2F95%2Flecture-29-convolutional-neural-networks-computer-vision-spring2015-28-638.jpg%3Fcb%3D1430740006&imgrefurl=http%3A%2F%2Fwww.slideshare.net%2Fjbhuang%2Flecture-29-convolutional-neural-networks-computer-vision-spring2015&docid=4H5rs1C7yE0SOM&tbnid=hwo2wB7VAAGuHM%3A&w=638&h=479&ved=0ahUKEwiNjanOtrHOAhVIwI8KHRDlAGkQMwhEKBQwFA&iact=mrc&uact=8&biw=1329&bih=663#h=479&imgdii=hwo2wB7VAAGuHM%3A%3Bhwo2wB7VAAGuHM%3A%3BI0fHCe0fvva2pM%3A&w=638

http://www.mdpi.com/2072-4292/7/11

Transfer Learning

Action and Attributes from Wholes and Parts

 

R-CNNs for Pose Estimation and Action Detection

 

 

Contextual Action Recognition with R*CNN

—————————————————————-

REF: http://www.ee.cuhk.edu.hk/~wlouyang/projects/imagenetDeepId/

 

http://www.mshahriarinia.com/home/ai/machine-learning/neural-networks/deep-learning/theano-mnist/3-convolutional-neural-network-lenet

 

 

Good One

 

http://rnduja.github.io/2015/10/12/deep_learning_with_torch_step_5_rnn_lstm/

 

 

 

 

Implementation Issue for VQA

1. Jupyter Setup Issue


 

---------------------------------------------------------------
$jupyter-notebook
Traceback (most recent call last):
  File "/usr/local/bin/jupyter-notebook", line 7, in <module>
    from notebook.notebookapp import main
  File "/usr/local/lib/python2.7/dist-packages/notebook/notebookapp.py",
line 45, in <module>
    raise ImportError(msg + ", but you have %s" % tornado.version)
ImportError: The Jupyter Notebook requires tornado >= 4.0, but you have 3.1.1
------------------------------------------------------------------
Answer: This problem will be seen, if u have installed both ipython-notebook and 
jupyter-notebook.

$It can be solved  by  uninstalling  ipython -notebook.


2. CuDNN problems

user@user-XPS-8500:~/neural-style$ th neural_style.lua -gpu 0 -backend cudnn
nil 
/home/user/torch/install/bin/luajit: /home/user/torch/install/share/lua/5.1/trepl/init.lua:384: /home/ben/torch/install/share/lua/5.1/trepl/init.lua:384: /home/user/torch/install/share/lua/5.1/cudnn/ffi.lua:1279: 'libcudnn (R4) not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure files named as libcudnn.so.4 or libcudnn.4.dylib are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
    [C]: in function 'error'
    /home/user/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
    neural_style.lua:64: in function 'main'
    neural_style.lua:500: in main chunk
    [C]: in function 'dofile'
    .../user/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670
user@user-XPS-8500:~/neural-style$ 
-----------------------------------------------
& Solution

After registering as a developer with NVIDIA, you can download cuDNN here. Make sure to download Version 4.

After dowloading, you can unpack and install cuDNN like this:

tar -xzvf cudnn-7.0-linux-x64-v4.0-prod.tgz
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-7.0/lib64/
sudo cp cuda/include/cudnn.h /usr/local/cuda-7.0/include/

Also check your LD_LIBRARY_PATH:

echo $LD_LIBRARY_PATH

You should see /usr/local/cuda-7.0/lib64 along with possibly other things.

if ur not getting like this then do

You need to add something like this to your .bashrc or other startup scripts.

export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:$LD_LIBRARY_PATH

then source ~/.bashrc.

Which can solve the problem.

———————————————————————————–

Setup for Torch

Steps for torch installation:

  1. rm -rf ~/torch
  2. git clone https://github.com/torch/distro.git ~/torch –recursive
  3. cd ~/torch;
  4. ./install.sh
  5. source ~/.bashrc

Update

To update your already installed distro to the latest master branch

./update.sh

Test

To test that all libraries are installed properly by running:

./test.sh

For other packages:

  1. luarocks install rnn
  2. luarocks install loadcaffe
  3. luarocks install hdf5

————————————————————————————-

If you face problem in the installation like:

issue: cutorch/lib/THC/generic/THCTensorMath.cu(393): error: more than one operator “==” matches these operands:

 

[ 14%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o
/home/ubuntu/torch/extra/cutorch/lib/THC/generic/THCTensorMath.cu(393): error: more than one operator "==" matches these operands:
            function "operator==(const __half &, const __half &)"
            function "operator==(half, half)"
            operand types are: half == half

/home/ubuntu/torch/extra/cutorch/lib/THC/generic/THCTensorMath.cu(414): error: more than one operator "==" matches these operands:
            function "operator==(const __half &, const __half &)"
            function "operator==(half, half)"
            operand types are: half == half

[ 15%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathReduce.cu.o
2 errors detected in the compilation of "/tmp/tmpxft_00002141_00000000-4_THCTensorMath.cpp4.ii".
CMake Error at THC_generated_THCTensorMath.cu.o.cmake:267 (message):
  Error generating file
  /home/ubuntu/torch/extra/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THCTensorMath.cu.o


lib/THC/CMakeFiles/THC.dir/build.make:112: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o' failed
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
^Clib/THC/CMakeFiles/THC.dir/build.make:105: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorCopy.cu.o' failed
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorCopy.cu.o] Interrupt
lib/THC/CMakeFiles/THC.dir/build.make:140: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o' failed
make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o] Interrupt
CMakeFiles/Makefile2:172: recipe for target 'lib/THC/CMakeFiles/THC.dir/all' failed
make[1]: *** [lib/THC/CMakeFiles/THC.dir/all] Interrupt
Makefile:127: recipe for target 'all' failed
make: *** [all] Interrupt

Error: Build error: Failed building.

Solution:

  1. ./clean.sh (clear the installation)
  2. export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
  3. ./install.sh

 

Ref: https://github.com/torch/cutorch/issues/797

https://github.com/torch/distro/issues/239

 

 

 

Issue: Data parallel: arguments are located on different GPUs: /usr/local/torch/extra/cutorch/lib/THC/generic/THCTensorMath.cu:21
stack traceback:

Solution:

This is the problem of  makeDataParallel function, which uses cutorch

reinstall torch with the latest release :

or

update then reinstall

  • luarocks install cutorch
  • luarocks install cunn

 

Issue: $ luarocks install cutorch
Warning: Failed searching manifest: Failed fetching manifest for https://raw.githubusercontent.com/torch/rocks/master – Failed downloading 

Solution: Reinstall torch

 

Reference:

http://torch.ch/

https://en.wikipedia.org/wiki/Torch_(machine_learning)

 

Torch Installation

Step 0: Installing Torch

# in a terminal, run the commands WITHOUT sudo
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh

The first script installs the basic package dependencies that LuaJIT and Torch require. The second script installs LuaJIT, LuaRocks, and then uses LuaRocks (the lua package manager) to install core packages like torch, nn and paths, as well as a few other packages.

The script adds torch to your PATH variable. You just have to source it once to refresh your env variables. The installation script will detect what is your current shell and modify the path in the correct configuration file.

# On Linux with bash
source ~/.bashrc

if you ever need to uninstall torch, simply run the command:

rm -rf ~/torch

Step 1: Install torch with Lua 5.2

If you want to install torch with Lua 5.2 instead of LuaJIT, simply run:

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch

# clean old torch installation
./clean.sh

# https://github.com/torch/distro : set env to use lua
TORCH_LUA_VERSION=LUA52 ./install.sh

Step 3: Install required packages using Luarocks

New packages can be installed using Luarocks from the command-line:

# Install using luarocks
luarocks install torch
luarocks install nn
luarocks install nngraph 
$luarocks install image 
luarocks install optim
luarocks install lua-cjson
major
luarocks install torch-word-emb
luarocks install rnn
luarocks install nnx
luarocks install dp
luarocks install dpnn
luarocks install itorch
luarocks install sys
luarocks install xlua
luarocks install penlight 

luarocks install display
luarocks install gnuplot
luarocks install imgraph
luarocks install signal

---optional
luarocks install senna
luarocks install cephes
-----------------------------------------------------------

Step 4: Install torch-hdf5

# We need to install torch-hdf5 from GitHub
git clone https://github.com/deepmind/torch-hdf5
cd torch-hdf5
luarocks make hdf5-0-0.rockspec

Step 5: Install loadcaffe

loadcaffe depends on Google’s Protocol Buffer library so we’ll need to install that first:

sudo apt-get install libprotobuf-dev protobuf-compiler

Now we can instal loadcaffe:

luarocks install loadcaffe

Step 6: Install CUDA backend for torch(For GPU  Only)

If you’d like to train on an NVIDIA GPU using CUDA (this can be to about 15x faster), you’ll of course need the GPU, and you will have to install the CUDA Toolkit. Then get the cutorch and cunn packages:

luarocks install cutorch
luarocks install cunn
luarocks install cudn

Step 7: Install  OpenCL Backend for torch(Optional)

If you’d like to use OpenCL GPU instead (e.g. ATI cards), you will instead need to install the cltorch and clnn packages, and then use the option -opencl 1 during training (cltorch issues):

luarocks install cltorch
luarocks install clnn

Step 8: Install cuDNN(Optional)

cuDNN is a library from NVIDIA that efficiently implements many of the operations (like convolutions and pooling) that are commonly used in deep learning.

After registering as a developer with NVIDIA, you can download cuDNN here. Make sure to download Version 4.

After dowloading, you can unpack and install cuDNN like this:

tar -xzvf cudnn-7.0-linux-x64-v4.0-prod.tgz
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-7.0/lib64/
sudo cp cuda/include/cudnn.h /usr/local/cuda-7.0/include/

Also check your LD_LIBRARY_PATH:

echo $LD_LIBRARY_PATH

You should see /usr/local/cuda-7.0/lib64 along with possibly other things.

if ur not getting like this then do

You need to add something like this to your .bashrc or other startup scripts.

export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:$LD_LIBRARY_PATH

then source ~/.bashrc.

Which can solve following problem

CuDNN problems

user@user-XPS-8500:~/neural-style$ th neural_style.lua -gpu 0 -backend cudnn
nil 
/home/user/torch/install/bin/luajit: /home/user/torch/install/share/lua/5.1/trepl/init.lua:384: /home/ben/torch/install/share/lua/5.1/trepl/init.lua:384: /home/user/torch/install/share/lua/5.1/cudnn/ffi.lua:1279: 'libcudnn (R4) not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure files named as libcudnn.so.4 or libcudnn.4.dylib are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
    [C]: in function 'error'
    /home/user/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
    neural_style.lua:64: in function 'main'
    neural_style.lua:500: in main chunk
    [C]: in function 'dofile'
    .../user/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670
user@user-XPS-8500:~/neural-style$ 
-----------------------------------------------

Next we need to install the torch bindings for cuDNN:

luarocks install cudnn
------------------------------------------------------------------------

Ref
http://torch.ch/docs/getting-started.html
https://github.com/torch/torch7/wiki/Cheatsheet
https://github.com/jcjohnson/neural-style/issues/154
https://github.com/jcjohnson/neural-style/blob/master/INSTALL.md
https://github.com/karpathy/char-rnn
https://github.com/karpathy/neuraltalk2


Install Torch Locally
https://github.com/torch/distro

Self-contained Torch installation

Install dependencies. Usesapt-get on Ubuntu, which might require sudo. Usesbrew on OSX.

curl -s https://raw.githubusercontent.com/torch/distro/master/install-deps | bash

Install this repo, which installs the torch distribution, with a lot of nice goodies.

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; ./install.sh

By default Torch will install LuaJIT 2.1. If you want other options, you can use the command:

TORCH_LUA_VERSION=LUA51 ./install.sh
TORCH_LUA_VERSION=LUA52 ./install.sh

Now, everything should be installed. Either open a new shell, or source your profile via

. ~/.bashrc  # or: . ~/.zshrc
th -e "print 'I just installed Torch! Yesss.'"

Note: If you use a non-standard shell, you’ll want to run this command

./install/bin/torch-activate

Tested on Ubuntu 14.04, CentOS/RHEL 6.3 and OSX

for more information, plz check
https://github.com/torch/torch7/issues/27

http://leafo.net/guides/customizing-the-luarocks-tree.html
not required

luarocks install --deps-mode=all --local nngraph

Torch Installation 3 Steps

Step:1
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; ./install.sh

By default Torch will install LuaJIT 2.1. If you want other options, you can use the command:

step 2: (optional)
TORCH_LUA_VERSION=LUA51 ./install.sh

Now, everything should be installed. Either open a new shell, or source your profile via

step 3:
. ~/.bashrc 
th -e "print 'I just installed Torch! Yesss.'"

Torch uninstallation

rm -rf ~/torch 

luarocks package Installation

luarocks install cunn 1.0.0
luarocks remove  cutorch

luarocks package remove

remove – Uninstall a rock.
luarocks remove --force cunn 1.0.0
luarocks remove --force cutorch

 

REF:
https://github.com/keplerproject/luarocks/wiki/luarocks
https://github.com/keplerproject/luarocks/wiki/remove
http://torch.ch/docs/getting-started.html
https://github.com/torch/torch7/wiki/Cheatsheet

Setup for Keras (Tensorflow Backend) and for Keras (Theano Backend)

Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano.

In order to install Keras, it requires  miniconda on python 2.x . Also, Keras uses the following dependencies:

  1. NumPy
  2. SciPy
  3. HDF5 and h5py
  4. Theano
  5.  TensorFlow

Step 0: Install miniconda

$ wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-`uname -p`.sh
$ bash Miniconda-latest-Linux-`uname -p`.sh
$ source ~/.bashrc

See also the install guide.

Step 1: Numpy, scipy, nose

a: Numpy:

NumPy is the fundamental package needed for scientific computing with Python. This package contains:

  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code
  • useful linear algebra, Fourier transform, and random number capabilities.

b: Scipy

SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. In particular, these are some of the core packages: NumPy,SciPy library,Matplotlib,IPython,Sympu,pandas

c:Nose:

nose extends unittest to make testing easier

d:PIP

pip is the preferred installer program. pip is a package management system used to install and manage software packages written in Python.

Install numpy, scipy, nose

$ sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose libopenblas-dev git

Step 2: Install h5py

$ conda install -y h5py

Step 3: Install Theano OrInstall TensorFlow

Please install either Theano or TensorFlow.  If you want to use Karas with backend as theano , so don't install TensorFlow and viceversa.

A: Theano

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.

  • tight integration with NumPy – Use numpy.ndarray in Theano-compiled functions.
  • transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
  • efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs.
  • speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
  • dynamic C code generation – Evaluate expressions faster.
  • extensive unit-testing and self-verification – Detect and diagnose many types of errors.

Install Theano

$ pip install git+git://github.com/Theano/Theano.git

Ref: http://deeplearning.net/software/theano/

B: TensorFlow

TensorFlow is an open source software library for numerical computation using data flow graphs.

Install TensorFlow

$ pip install --upgrade -I setuptools
$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.9.0rc

This is the workaround the bug for install tensorflow. See this for more information.

Ref:

Step 4: Install Keras

Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano

$ pip install git+git://github.com/fchollet/keras.git

Step 5: spaCy: Industrial-strength NLP

spaCy is a library for advanced natural language processing in Python and Cython.
Documentation and details: https://spacy.io/

Install spaCy with conda

$ conda install -c https://conda.anaconda.org/spacy spacy
$ conda install spacy
$ python -m spacy.en.download all --force

ref: https://github.com/spacy-io/spaCy

Step 6: scikit-learn (Machine Learning in Python)

scikit-learn is a Python module for machine learning built on top of SciPy.

  • Simple and efficient tools for data mining and data analysis
  • Built on NumPy, SciPy, and matplotlib

Install scikit-learn with conda

$ conda install scikit-learn

ref:

http://scikit-learn.org/stable/

https://github.com/scikit-learn/scikit-learn

Step 7: Python Progressbar

It is s text progress bar library for Python. Typically used to display the progress of a long running operation, providing a visual cue that processing is underway.

Install progressbar with pip

$ pip install progressbar

Ref:

https://github.com/WoLpH/python-progressbar

Step 8: Test Keras

$ wget https://raw.githubusercontent.com/fchollet/keras/master/examples/mnist_mlp.py
$ python mnist_mlp.py

The result here.

60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
8s - loss: 0.4356 - acc: 0.8716 - val_loss: 0.1863 - val_acc: 0.9421
Epoch 2/20
7s - loss: 0.1961 - acc: 0.9414 - val_loss: 0.1274 - val_acc: 0.9601
Epoch 3/20
7s - loss: 0.1451 - acc: 0.9564 - val_loss: 0.1010 - val_acc: 0.9691
Epoch 4/20
8s - loss: 0.1189 - acc: 0.9642 - val_loss: 0.0847 - val_acc: 0.9752
Epoch 5/20
8s - loss: 0.1019 - acc: 0.9691 - val_loss: 0.0850 - val_acc: 0.9735
Epoch 6/20
8s - loss: 0.0903 - acc: 0.9721 - val_loss: 0.0749 - val_acc: 0.9777
Epoch 7/20
8s - loss: 0.0822 - acc: 0.9745 - val_loss: 0.0753 - val_acc: 0.9762
Epoch 8/20
7s - loss: 0.0758 - acc: 0.9762 - val_loss: 0.0743 - val_acc: 0.9796
Epoch 9/20
7s - loss: 0.0705 - acc: 0.9780 - val_loss: 0.0720 - val_acc: 0.9784
Epoch 10/20
8s - loss: 0.0648 - acc: 0.9790 - val_loss: 0.0688 - val_acc: 0.9793
Epoch 11/20
8s - loss: 0.0592 - acc: 0.9819 - val_loss: 0.0663 - val_acc: 0.9797
Epoch 12/20
8s - loss: 0.0567 - acc: 0.9824 - val_loss: 0.0677 - val_acc: 0.9815
Epoch 13/20
8s - loss: 0.0536 - acc: 0.9833 - val_loss: 0.0711 - val_acc: 0.9796
Epoch 14/20
8s - loss: 0.0520 - acc: 0.9834 - val_loss: 0.0684 - val_acc: 0.9806
Epoch 15/20
9s - loss: 0.0500 - acc: 0.9837 - val_loss: 0.0664 - val_acc: 0.9807
Epoch 16/20
7s - loss: 0.0471 - acc: 0.9850 - val_loss: 0.0683 - val_acc: 0.9809
Epoch 17/20
7s - loss: 0.0449 - acc: 0.9856 - val_loss: 0.0682 - val_acc: 0.9812
Epoch 18/20
8s - loss: 0.0433 - acc: 0.9860 - val_loss: 0.0675 - val_acc: 0.9813
Epoch 19/20
7s - loss: 0.0401 - acc: 0.9869 - val_loss: 0.0683 - val_acc: 0.9819
Epoch 20/20
8s - loss: 0.0383 - acc: 0.9874 - val_loss: 0.0705 - val_acc: 0.9820
Test score: 0.0704572771238
Test accuracy: 0.982

Step 9: Setup the backend of Keras

$ mkdir -p ~/.keras
$ echo '{"epsilon":1e-07,"floatx":"float32","backend":"tensorflow"}' > ~/.keras/keras.json

Step 10:Make sure that Keras runs with TensorFlow

$ curl -sSL https://github.com/fchollet/keras/raw/master/examples/mnist_mlp.py | python

------------------------------------------------------------------------------------

Reference

http://keras.io/

http://ermaker.github.io/blog/2015/09/08/get-started-with-keras-for-beginners.html

http://ermaker.github.io/blog/2016/06/22/get-started-with-keras-for-beginners-tensorflow-backend.html

https://gajumaru4444.github.io/2015/11/10/Visual-Question-Answering-2.html