Code For LSTM and CNN

TemporalConvolution Example

—————————————————————

Model :  Input-> CNN-> LSTM-> Softmax

——————————————————–

–(Mini-batching using RNN)

local net = nn.Sequential()

net:add(nn.Sequencer(nn.TemporalConvolution(256,200,1,1)))
net:add(nn.Sequencer(nn.ReLU()))
net:add(nn.Sequencer(nn.TemporalMaxPooling(2,2)))
net:add(nn.Sequencer(nn.TemporalConvolution(200,170,1,1)))
net:add(nn.Sequencer(nn.ReLU()))
net:add(nn.Sequencer(nn.TemporalConvolution(170,150,1,1)))
net:add(nn.Sequencer(nn.ReLU()))
net:add(nn.Sequencer(nn.BatchNormalization(150)))
net:add(nn.Sequencer(nn.Linear(150,120)))
net:add(nn.Sequencer(nn.ReLU()))
net:add(nn.BiSequencer(nn.FastLSTM(120,40),nn.FastLSTM(120,40)))
net:add(nn.Sequencer(nn.BatchNormalization(40*2)))
net:add(nn.Sequencer(nn.Linear(40*2,27)))
net:add(nn.Sequencer(nn.SoftMax()))


Ref paper : Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

http://arxiv.org/pdf/1512.02595v1.pdf

Group

https://groups.google.com/forum/#!topic/torch7/YAQcqminACY



 

This tutorial will focus on giving you working knowledge to implement and test a convolutional neural network with torch. If you have not yet setup your machine, please go back and give this a read before starting.

You can start iTorch by opening a console, navigating to <install dir>/torch/iTorch and typing ‘itorch notebook’. Once this is done, a webpage should pop up, which is served from your own local machine. Go ahead and click the ‘New Notebook’ button.

Because we’re just starting out, we will start with a very simple problem to solve. Suppose you have a signal which needs to be classified as either a square pulse, or a triangular pulse. Each pulse is sampled over time. To make the problem slightly more challenging, lets say the pulse is not always in the same place, and the pulse can have constrained but random height and width. There are several techniques we could use to solve this problem. We could do signal processing such as taking the FFT, or we could code up our own custom filters. But that involves work, and also becomes impossible when faced with larger problems. So what do we do? We can build a convolutional neural network!

Convolutional Networks
Convolutional Layers
The network will start out with a 64×1 vector, which we can effectively call a 1-D vector with each value representing the signal strength at each point in time. Next we apply a convolution of those 64 points with ten kernels, each with 7-elements. These kernel weights will act as filters, or features. We don’t know yet what the values will be, since they will be learned as we train the network. Layers of the network that take an input, and convolve on ore more filters to create an output are called convolutional layers. Example:

Convolution 3×1 kernel, 8×1 input

Input: 2 4 3 6 5 3 7 6
Kernel values: -1 2 -1
Output: 3 -4 4 1 -6 5
Further explanation: (-1*2)+(2*4)+(-1*3) = 3

Pooling layer
After convolutional layers, there is frequently a pooling layer. This layer is used to reduce the problem size, and thus speed up training greatly. Typically, MaxPooling is used, which acts like a king of convolution, except that it has a stride usually equal to the kernel size, and the ‘kernel’ really just takes the maximum value of the input, and outputs that maximum value. This is great for classification problems such as this, because the position of the signal isn’t very important, just whether it is square or triangular. So pooling layers throw away some positioning data, but make the problem smaller and easier to train. Example:

Max pooling layer, size 2, stride 2
Input: 3 5 7 6 3 4
Output: 5 7 4
Further explanation: Max(3,5) = 5, Max(7,6) = 7, Max(3,4) = 4

Activation Function
Neural networks achieve their power by introducing non-linearities into the system. Otherwise, networks just become big linear algebra problems, and there is no point in having many layers. In days past, the sigmoid used to be most common, however, recent breakthroughs have indicated that ReLU is a much better operator for deep neural networks. Basically, it is just ‘y = max(0,x)’. So if x is negative, y is 0, otherwise, y is equal to x. Example:

Input: 4 6 2 -4
Output: 4 6 2 0

 

————————————————————————–

Awesome Example for TemporalConvolution

————————————————————–

First things first, be sure to include the neural network package.

-- First, be sure to require the 'nn' package for the neural network functions
require 'nn';

Next, we’ll need to create some training data. Neural networks require many examples in order to train, so we choose to generate 10000 example signals. This number may seem large, but remember that we have 4 randomized components to each wave; Type, height, width, start index. This translates to 2*6*21*6 = 1512 possible permutations. In real life, problems are much more complex.

-- Next, create the training data. We'll use 10000 samples for now
nExamples = 10000

trainset = {}
trainset.data = torch.Tensor(nExamples,64,1):zero() -- Data will be sized as 5000x64x1
trainset.label = torch.Tensor(nExamples):zero()     -- Use one dimensional tensor for label

--The network trainer expects an index metatable
setmetatable(trainset, 
{__index = function(t, i) 
    return {t.data[i], t.label[i]}  -- The trainer is expecting trainset[123] to be {data[123], label[123]}
    end}
);

--The network trainer expects a size function
function trainset:size() 
    return self.data:size(1) 
end

function GenerateTrainingSet()

    -- Time to prepare the training set with data
    -- At random, have data be either a triangular pulse, or a rectangular pulse
    -- Have randomness as to when the signal starts, ends, and how high it is
    for i=1,nExamples do
        curWaveType = math.random(1,2)      -- 1 for triangular signal, 2 for square pulse
        curWaveHeight = math.random(5,10)   -- how high is signal
        curWaveWidth = math.random(20,40)   -- how wide is signal
        curWaveStart = math.random(5,10)    -- when to start signal
    
        for j=1,curWaveStart-1 do
            trainset.data[i][j][1] = 0
        end
    
        if curWaveType==1 then   -- We are making a triangular wave
            delta = curWaveHeight / (curWaveWidth/2);
            for curIndex=1,curWaveWidth/2 do
                trainset.data[i][curWaveStart-1+curIndex][1] = delta * curIndex
            end
            for curIndex=(curWaveWidth/2)+1, curWaveWidth do
                trainset.data[i][curWaveStart-1+curIndex][1] = delta * (curWaveWidth-curIndex)
            end
            trainset.label[i] = 1
        else
            for j=1,curWaveWidth do
                trainset.data[i][curWaveStart-1+j][1] = curWaveHeight
            end
            trainset.label[i] = 2
        end
    end
end

GenerateTrainingSet()

Next, we will construct our neural network. Starting with 64×1 data going in, we will go two Convolution-MaxPool-ReLU ‘layers’, and end with a two layer fully connected neural network, and end with two outputs. Because this is a classification problem, we’ll use log-probability output. Whichever output is greatest (close to zero) is the selection of the network. The other output should have a negative value.

-- This is where we build the model
model = nn.Sequential()                       -- Create network

-- First convolution, using ten, 7-element kernels
model:add(nn.TemporalConvolution(1, 10, 7))   -- 64x1 goes in, 58x10 goes out
model:add(nn.TemporalMaxPooling(2))           -- 58x10 goes in, 29x10 goes out
model:add(nn.ReLU())                          -- non-linear activation function

-- Second convolution, using 5, 7-element kernels
model:add(nn.TemporalConvolution(10, 5, 7))   -- 29x10 goes in, 23x5 goes out
model:add(nn.TemporalMaxPooling(2))           -- 23x5 goes in, 11x5 goes out
model:add(nn.ReLU())                          -- non-linear activation function

-- After convolutional layers, time to do fully connected network
model:add(nn.View(11*5))                        -- Reshape network into 1D tensor

model:add(nn.Linear(11*5, 30))                  -- Fully connected layer, 55 inputs, 30 outputs
model:add(nn.ReLU())                            -- non-linear activation function

model:add(nn.Linear(30, 2))                     -- Final layer has 2 outputs. One for triangle wave, one for square
model:add(nn.ReLU())                            -- non-linear activation function
model:add(nn.LogSoftMax())                      -- log-probability output, since this is a classification problem

With torch, we can see the dimensions of a tensor by applying a ‘#’ before it. So at any time when constructing the network, you can create a partially complete network, and propagate a blank tensor through it and see what the dimension of the last layer is.

-- When building the network, we can test the shape of the output by sending in a dummy tensor
#model:forward(torch.Tensor(64,1))

Next, we set our criteria to nn.ClassNLLCriterion, which is helpful for classification problems. Next, we create a trainer using the StochasticGradient descent algorithm, and set the learning rate and number of iterations. If the learning rate is too high, the network will not converge. If it is too low, the network will converge too slowly. So it takes practice to get this just right.

criterion = nn.ClassNLLCriterion()
trainer = nn.StochasticGradient(model, criterion)
trainer.learningRate = 0.01
trainer.maxIteration = 200 -- do 200 epochs of training

Finally, we train our model! Go grab a cup of coffee, it may take a while. Later we will focus on accelerating these training sessions with the GPU, but our network is so small right now that it isn’t practical to accelerate.

trainer:train(trainset)

We can see what an example output and label are below.

-- Lets see an example output
model:forward(trainset.data[123])

-- Lets see which label that is
trainset.label[123]

Let’s figure out how many of the examples are predicted correctly.

function TestTrainset()
    correct = 0
    for i=1,nExamples do
        local groundtruth = trainset.label[i]
        local prediction = model:forward(trainset.data[i])
        local confidences, indices = torch.sort(prediction, true)  -- sort in descending order
        if groundtruth == indices[1] then
            correct = correct + 1
        else
            --print("Incorrect! "..tostring(i))
        end
    end
    print(tostring(correct))
end

-- Lets see how many out of the 10000 samples we predict correctly!
TestTrainset()

Hopefully, that number should read 10,000. Next, let’s be sure our network is really trained well. Let us generate new training sets, and test them. Hopefully, everything will be 10,000, but if there are some incorrect examples, go back and train some more. In real life, we can suffer from a phenomenon called over-training where the model is over-fit to our training data, but we will cover this in a later article. Try to train your network until it passes everything you can throw at it.

-- Generate a new set of data, and test it
for i=1,10 do
    GenerateTrainingSet()
    TestTrainset()
end

Great, you’ve done it! Now, lets try to gain some understanding into what’s going on here. We created two convolutional layers, the first having ten 1×7 kernels, and the second convolutional layer having five, 10×7 kernels. The reason I use itorch instead of the command line torch interface is so I can easily inspect graphics. Let’s take a look at the filter in the first convolutional layer. We can see that each row is a filter.

require 'image'
itorch.image(model.modules[1].weight)

Kernel_1

We can also see which neurons activate the most. You can propagate any input through the network with the :forward function, as demonstrated earlier. Then, we can visualize the outputs of the ReLU (or any) layers. For example, here is the output of the first ReLu layer. It is obvious that some filters are activating more than others.

itorch.image(model.modules[3].output)

ReLu_1

Next, lets take a look at the next ReLu layer output. Here we can see that the neurons in the 5th layer are by far the most active for this input. So we know that even if our filters look a little chaotic, neurons in a particular layer do activate and stand out. Finally, these values are sent to the fully connected neural network, which makes sense of what it means when different filters are activated in relation to other filters.

itorch.image(model.modules[6].output)

ReLu_2

Now that we understand how different filters activate with certain inputs, let us introduce noise into system and see how the neural network deals with this.

function IntroduceNoise()
    for i=1,nExamples do
        for j=1,64 do
            trainset.data[i][j] = trainset.data[i][j] + torch.normal(0,.25);
        end
    end
end

-- Generate a new set of data, and test it
for i=1,10 do
    GenerateTrainingSet()
    IntroduceNoise()
    TestTrainset()
end

After training my network around 600 epochs, I was able to achieve 100% perfect signal categorization with the noisy inputs, even though I only trained on the noiseless inputs. Wow! This shows us that the network does indeed work, and is powerful enough to filter out noise which happens in real life data. Next, we will be ready for more interesting challenges!

-- To see the network's structure and variables
model.modules

Thanks to this site:

http://supercomputingblog.com/machinelearning/an-intro-to-convolutional-networks-in-torch/

————————————

Ref:

https://stackoverflow.com/questions/36771635/after-loading-a-trained-model-in-torch-how-to-use-this-loaded-model-to-classify

https://groups.google.com/forum/#!topic/torch7/XL30bTW6mNs

https://groups.google.com/forum/#!topic/torch7/jdZ15JjVLSw

https://groups.google.com/forum/#!topic/torch7/peOLx3tfuSQ

http://staff.ustc.edu.cn/~cheneh/paper_pdf/2014/Yi-Zheng-WAIM2014.pdf

————————————

Convolution Tut: Temporal and Spatial

http://torch.ch/torch3/matos/convolutions.pdf


 

lookupTableLayer = nn.LookupTable(vector:size()[1], d)
for i=1,vector:size()[1] do
  lookupTableLayer.weight[i] = vector[i]
end
mlp=nn.Sequential();
mlp:add(lookupTableLayer)
mlp:add(nn.TemporalConvolution(d,H,K,dw))
mlp:add(nn.Tanh())
mlp:add(nn.Max(1))
mlp:add(nn.Tanh())
mlp:add(nn.Linear(H,d))

Now, to train the network, I loop through every training example and for every example I call gradUpdate() which has this code (this is straight from the examples):

function gradUpdate(mlp, x, indexY, learningRate)
  local pred = mlp:forward(x)
  local gradCriterion = findGrad(pred, indexY)
  mlp:zeroGradParameters()
  mlp:backward(x, gradCriterion)
  mlp:updateParameters(learningRate)
end

https://github.com/ganeshjawahar/torch-teacher/blob/master/stanford/model_nngraph.lua

— encode the question
local question_word_vectors = nn.LookupTableMaskZero(#params.dataset.index2word, params.dim)(question):annotate{name = question_word_lookup}
local question_encoder = nn.BiSequencer(nn.GRU(params.dim, params.hid_size, nil, 0), nn.GRU(params.dim, params.hid_size, nil, 0):sharedClone(), nn.JoinTable(1, 1))
(nn.SplitTable(1, 2)(question_word_vectors)):annotate{name = question_encoder}
local final_q_out = nn.Dropout(params.dropout)(nn.Unsqueeze(3)(nn.SelectTable(1)(question_encoder))) — get the last step output
— encode the passage
local passage_word_vectors = nn.LookupTableMaskZero(#params.dataset.index2word, params.dim)(passage):annotate{name = passage_word_lookup}
local passage_encoder = nn.BiSequencer(nn.GRU(params.dim, params.hid_size, nil, 0), nn.GRU(params.dim, params.hid_size, nil, 0):sharedClone(), nn.JoinTable(1, 1))
(nn.SplitTable(1, 2)(passage_word_vectors)):annotate{name = passage_encoder}
local final_p_out = nn.Dropout(params.dropout)(nn.View(params.bsize, 1, 2 * params.hid_size)
(nn.JoinTable(2)(passage_encoder))) -- combine the forward and backward rnns' output
l = nn.LookupTableMaskZero(3, 1)
print(l:forward(torch.LongTensor{1}))
print(l:forward(torch.LongTensor{0}))


https://github.com/chapternewscu/image-captioning-with-semantic-attention/blob/master/test_attention_weights_criterion.lua
function Seq2Seq:buildModel()
self.encoder = nn.Sequential()
self.encoder:add(nn.LookupTableMaskZero(self.vocabSize, self.hiddenSize))
self.encoderLSTM = nn.FastLSTM(self.hiddenSize, self.hiddenSize):maskZero(1)
self.encoder:add(nn.Sequencer(self.encoderLSTM))
self.encoder:add(nn.Select(1,1))
self.decoder = nn.Sequential()
self.decoder:add(nn.LookupTableMaskZero(self.vocabSize, self.hiddenSize))
self.decoderLSTM = nn.FastLSTM(self.hiddenSize, self.hiddenSize):maskZero(1)
self.decoder:add(nn.Sequencer(self.decoderLSTM))
self.decoder:add(nn.Sequencer(nn.MaskZero(nn.Linear(self.hiddenSize, self.vocabSize),1)))
self.decoder:add(nn.Sequencer(nn.MaskZero(nn.LogSoftMax(),1)))
self.encoder:zeroGradParameters()
self.decoder:zeroGradParameters()
end
Ref: https://github.com/Element-Research/rnn/issues/155
-- Encoder
local enc = nn.Sequential()
enc:add(nn.LookupTableMaskZero(opt.vocabSize, opt.hiddenSize))
enc:add(nn.SplitTable(1, 2)) -- works for both online and mini-batch mode
local encLSTM = nn.LSTM(opt.hiddenSize, opt.hiddenSize):maskZero(1)
enc:add(nn.Sequencer(encLSTM))
enc:add(nn.SelectTable(-1))

-- Decoder
local dec = nn.Sequential()
dec:add(nn.LookupTableMaskZero(opt.vocabSize, opt.hiddenSize))
dec:add(nn.SplitTable(1, 2)) -- works for both online and mini-batch mode
local decLSTM = nn.LSTM(opt.hiddenSize, opt.hiddenSize):maskZero(1)
dec:add(nn.Sequencer(decLSTM))
dec:add(nn.Sequencer(nn.MaskZero(nn.Linear(opt.hiddenSize, opt.vocabSize),1)))
dec:add(nn.Sequencer(nn.MaskZero(nn.LogSoftMax(),1)))
-- dec = nn.MaskZero(dec,1)

Ref: https://groups.google.com/forum/#!topic/torch7/ZUu4KhBqZ_0

 implemented the model using Nicholas Leonard’s rnn package (https://github.com/Element-Research/rnn) as follows:
model = nn.Sequential()
model:add(nn.LookupTableMaskZero(vocabSize, embeddingSize))
model:add(nn.SplitTable(1, 2))
 
lstm = nn.MaskZero(
  nn.Sequencer(
  nn.Sequential()
  :add(nn.LSTM(embeddingSize,nHidden))
  :add(nn.Dropout())
  :add(nn.LSTM(nHidden,nHidden))
  :add(nn.Dropout())
  :add(nn.Linear(nHidden, vocabSize))
  :add(nn.LogSoftMax())
  ), 1)
 
model:add(lstm)
criterion = nn.SequencerCriterion(nn.ClassNLLCriterion()) — not using SequencerCriterion as we only use the last output


Ref: https://github.com/Element-Research/rnn/issues/75

use MaskZeroCriterion

function
newModelBuild(dictionarySize,nbfeatures,embeddingSize,rhoInput,rhoOutput,lktype,logsoftFlag)
local model=nn.Sequential()
local p=nn.ParallelTable() 
p:add(nn.Identity()) --  -> carries the tensor of features
local lkt=nn.LookupTable(dictionarySize, embeddingSize)
local weightmatrix
if lktype == 0 then 
    weightmatrix=torch.Tensor(dictionarySize,embeddingSize)
    for i=1,dictionarySize do
        for j=1,embeddingSize do
            weightmatrix[i][j]=torch.uniform(0,1)
        end
    end
    lkt.weight:copy(weightmatrix)
else 
    lkt.weight:fill(1.0/embeddingSize)
end
p:add(nn.Sequencer(lkt)) -- ->ListofTensor(batchSize X embeddingSize)
model:add(p)
local SliceList=nn.ConcatTable() -- purpose: create a list tensor created by joining   tensors
for i=1, rhoInput do   
    local Slice =nn.Sequential()
    SliceList:add(Slice)
    local cc=nn.ConcatTable()   -- contains the 2 tensors to join
    Slice:add(cc)
    local a=nn.Sequential()
    cc:add(a)
    a:add(nn.SelectTable(2)) -- we select list of tensor(i)
    a:add(nn.SelectTable(i))  -- we select a tensor(i)
    local b=nn.Sequential()
    cc:add(b)
    b:add(nn.SelectTable(1)) -- we select  tensorF
    Slice:add(nn.JoinTable(2)) -- we create a single tensor = tensorF & tensor(i)
end
for i=rhoInput+1,rhoOutput do
    local Slice =nn.Sequential()
    SliceList:add(Slice)
    local cc=nn.ConcatTable()   -- contains the 2 tensors to join
    Slice:add(cc)
    local a=nn.Sequential()
    cc:add(a)
    a:add(nn.SelectTable(2)) -- we select list of tensor(i)
    a:add(nn.MaskZero(nn.SelectTable(i),1))  -- we select a tensor(i) : put at 0***********
    local b=nn.Sequential()
    cc:add(b)
    b:add(nn.SelectTable(1)) -- we select  tensorF
    Slice:add(nn.JoinTable(2)) -- we create a single tensor = tensorF & tensor(i)
end
model:add(SliceList)
model:add(nn.Sequencer(nn.FastLSTM(embeddingSize+nbfeatures, embeddingSize, rhoOutput)))
model:add(nn.Sequencer(nn.Linear(embeddingSize, dictionarySize)))
if logsoftFlag then model:add(nn.Sequencer(nn.LogSoftMax())) end
return model

 

lookuptable

 

REF:

http://stackoverflow.com/questions/29412658/torch-lookuptable-and-gradient-update

———————————————————-
want to perform zero padding before TemporalConvolution (after lookup table) in order to make sure that the input size is not less that the convolution window size. Here is my network:
model:add(nn.LookupTable(DICTIONARY_SIZE, DICTIONARY_VEC_DIMENTION))
model:add(nn.Padding <—— padding should go here
model:add(nn.TemporalConvolution(DICTIONARY_VEC_DIMENTION, K, CONV_WINDOW_SIZE, 1))
this problem was solved by using LookupTableMaskZero from rnn.
ref

————————————————————————–

 

LRCN

https://github.com/garythung/torch-lrcn

 

CNN link

http://nn.readthedocs.io/en/rtd/convolution/#convolutional-layers

http://nn.readthedocs.io/en/rtd/convolution/index.html#spatialconvolution

 

 

lookup table

http://torch5.sourceforge.net/manual/nn/index-2-5-5.html

https://stackoverflow.com/questions/37748421/lstm-on-top-of-cnn

 

local function create_network()
  local x                = nn.Identity()()
  local y                = nn.Identity()()
  local prev_s           = nn.Identity()()
  local i                = {[0] = LookupTable(params.vocab_size,
                                                    params.rnn_size)(x)}
  local next_s           = {}
  local split         = {prev_s:split(2 * params.layers)}
  for layer_idx = 1, params.layers do
    local prev_c         = split[2 * layer_idx - 1]
    local prev_h         = split[2 * layer_idx]
    local dropped        = nn.Dropout(params.dropout)(i[layer_idx - 1])
    local next_c, next_h = lstm(dropped, prev_c, prev_h)
    table.insert(next_s, next_c)
    table.insert(next_s, next_h)
    i[layer_idx] = next_h
  end
  local h2y              = nn.Linear(params.rnn_size, params.vocab_size)
  local dropped          = nn.Dropout(params.dropout)(i[params.layers])
  local pred             = nn.LogSoftMax()(h2y(dropped))
  local err              = nn.ClassNLLCriterion()({pred, y})
  local module           = nn.gModule({x, y, prev_s},
                                      {err, nn.Identity()(next_s)})
  module:getParameters():uniform(-params.init_weight, params.init_weight)
  return transfer_data(module)
end

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s