# Text Similarity

https://aclweb.org/anthology/S/S16/S16-1170.pdf

http://ttic.uchicago.edu/~kgimpel/papers/he+etal.emnlp15.pdf

http://web.eecs.umich.edu/~honglak/naacl2016-dscnn.pdf

https://cs.uwaterloo.ca/~jimmylin/publications/He_etal_NAACL-HTL2016.pdf

http://nlp.cs.berkeley.edu/pubs/FrancisLandau-Durrett-Klein_2016_EntityConvnets_paper.pdf

http://emnlp2014.org/papers/pdf/EMNLP2014181.pdf

http://arxiv.org/pdf/1503.08909v2.pdf

http://arxiv.org/pdf/1504.01561v1.pdf

Code

https://github.com/hohoCode/textSimilarityConvNet

# Convolution for NLP —Temporal Convolution

(Convolutional Neural Networks for Sentence Classification)

https://github.com/harvardnlp/sent-conv-torch

https://github.com/FredericGodin/DynamicCNN

https://github.com/harvardnlp/seq2seq-attn

https://github.com/harvardnlp/sent-conv-torch/blob/master/TempConv.ipynb

https://en.wikipedia.org/wiki/Convolutional_neural_network

http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/

http://stats.stackexchange.com/questions/182102/what-do-the-fully-connected-layers-do-in-cnns

https://www.quora.com/Why-are-fully-connected-layers-used-at-the-very-end-output-side-of-convolutional-NNs-Why-not-earlier

# CPU to GPU

Ref

http://kbullaughey.github.io/lstm-play/2015/09/21/torch-and-gpu.html

https://github.com/torch/cunn

local input = torch.Tensor(32,2):uniform()
input = input:cuda()
local output = model:forward(input)

… or create them directly as CudaTensors:

local input = torch.CudaTensor(32,2):uniform()
local output = model:forward(input)



## Using a GPU in Torch

Using a GPU in Torch is incredibly easy. Getting set up is simply a matter of requiring the cutorchpackage and using the CudaTensor type for your tensors.

cutorch = require 'cutorch'
x = torch.CudaTensor(2,2):uniform(-1,1)

Now all of the operations that involve x will computed on the GPU.

If you have a tensor that is not a CudaTensor but want to make it one, you can use the cuda()function to return a CudaTensor copy of the original:

x = torch.Tensor(2,2):zero()
xCuda = x:cuda()

You can see what type of tensor you have by inspecting it in the console:

th> x
0  0
0  0
[torch.DoubleTensor of size 2x2]
th> xCuda
0  0
0  0
[torch.CudaTensor of size 2x2]


You can also convert back to a CPU tensor:

th> y = xCuda:double()
th> y
0  0
0  0
[torch.DoubleTensor of size 2x2]


Keep in mind that the parameter matrices of the nn.Module objects also need to be configured for GPU use, as these contain internal tensors for storing parameters, and the forward/backward propagation state.

Lucky for us, these also have cuda() methods:

linearMap = nn.Linear(M,M):cuda()

# Thoughts on torch

Ref:

http://kbullaughey.github.io/lstm-play/2015/09/21/thoughts-on-torch.html

Basically the only hard thing I need to do when developing with torch is thinking about tensor dimensions. It seems an inordinate amount of my brain cycles are consumed in this way. But I don’t fault torch at this, as I think it’s an unavoidable aspect of working with multi-dimensional tensors.

### Lua

Lua also has many great things going for it, and by proxy these are also reasons why torch is great:

1. Very fast execution time (very little reason to consider C++ or other compiled languages).
2. Can be easily embedded in other applications.
3. Nice profiler provided by LuaJIT.
4. Given it’s interpreted, interactively prototyping code makes it easy to explore how things work.

Unfortunately, there are a number of not so fun aspects of lua:

1. Feels primitive and very bare-bones compared to other scripting languages.
3. Debugging facilities seem rather lacking.
4. nil. Because variables don’t need to be defined, spelling mistakes and other minor errors result in nil, which combined with poor stack traces sometimes makes it hard to locate the problem.

# Show Line Numbers  in Jyputer Notebook

shortcut key: L to toggle  line number

# HDF

Hierarchical Data Format (HDF) technologies uses to  management of large and complex data collections and ensure long-term access to HDF data.

## HDF5

HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.

REf

http://www.hdfgroup.org/

# H5py

The h5py package is a Pythonic interface to the HDF5 binary data format.

It lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays. Thousands of datasets can be stored in a single file, categorized and tagged however you want.

You can use python-h5py.

sudo apt-get install python-h5py

And then in your Python file try:

import h5py

Ref

http://www.h5py.org/

http://stackoverflow.com/questions/24744969/installing-h5py-on-an-ubuntu-server

ML

http://www.cs.utah.edu/~piyush/teaching/cs5350.html

http://vis.lbl.gov/~romano/mlgroup/papers/linear-dim-red.pdf

https://www.cs.utah.edu/~piyush/teaching/

http://vis.lbl.gov/~romano/mlgroup/papers/

————————————————————–

# Vision and NLP Conf

• CVPR: IEEE Conference on Computer Vision and Pattern Recognition
• ICCV: International Conference on Computer Vision
• ECCV: European Conference on Computer Vision
• NIPS: Neural Information Processing Systems
• ICLR: International Conference on Learning Representations
• ICML: International Conference on Machine Learning
• EMNLP: Empirical Methods in Natural Language Processing
• ACL: Association for Computational Linguistics
• NAACL: The North American Chapter of the Association for Computational Linguistics
• ACCV: Asian Conference on Computer Vision
• IJCV: International Journal of Computer Vision
•

# IP Conf

IEEE transactions on Image Processing
IEEE transactions on Signal Processing
IEEE transactions on Multimedia
ICIP
ICASSP
SIGIR
———————————————-
projects

Past CS229 Projects: Example projects from Stanford machine learning class

http://cs224d.stanford.edu/reports_2016.html

http://cs224d.stanford.edu/reports_2015.html

http://cs224d.stanford.edu/project.html

http://cs231n.stanford.edu/reports2016.html

http://cs231n.stanford.edu/project.html

# Code For LSTM and CNN

TemporalConvolution Example

—————————————————————

Model :  Input-> CNN-> LSTM-> Softmax

——————————————————–

–(Mini-batching using RNN)

local net = nn.Sequential()

Ref paper : Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

http://arxiv.org/pdf/1512.02595v1.pdf

Group

## An Intro to Convolutional Networks in Torch

This tutorial will focus on giving you working knowledge to implement and test a convolutional neural network with torch. If you have not yet setup your machine, please go back and give this a read before starting.

You can start iTorch by opening a console, navigating to <install dir>/torch/iTorch and typing ‘itorch notebook’. Once this is done, a webpage should pop up, which is served from your own local machine. Go ahead and click the ‘New Notebook’ button.

Because we’re just starting out, we will start with a very simple problem to solve. Suppose you have a signal which needs to be classified as either a square pulse, or a triangular pulse. Each pulse is sampled over time. To make the problem slightly more challenging, lets say the pulse is not always in the same place, and the pulse can have constrained but random height and width. There are several techniques we could use to solve this problem. We could do signal processing such as taking the FFT, or we could code up our own custom filters. But that involves work, and also becomes impossible when faced with larger problems. So what do we do? We can build a convolutional neural network!

Convolutional Networks
Convolutional Layers
The network will start out with a 64×1 vector, which we can effectively call a 1-D vector with each value representing the signal strength at each point in time. Next we apply a convolution of those 64 points with ten kernels, each with 7-elements. These kernel weights will act as filters, or features. We don’t know yet what the values will be, since they will be learned as we train the network. Layers of the network that take an input, and convolve on ore more filters to create an output are called convolutional layers. Example:

Convolution 3×1 kernel, 8×1 input

Input: 2 4 3 6 5 3 7 6
Kernel values: -1 2 -1
Output: 3 -4 4 1 -6 5
Further explanation: (-1*2)+(2*4)+(-1*3) = 3

Pooling layer
After convolutional layers, there is frequently a pooling layer. This layer is used to reduce the problem size, and thus speed up training greatly. Typically, MaxPooling is used, which acts like a king of convolution, except that it has a stride usually equal to the kernel size, and the ‘kernel’ really just takes the maximum value of the input, and outputs that maximum value. This is great for classification problems such as this, because the position of the signal isn’t very important, just whether it is square or triangular. So pooling layers throw away some positioning data, but make the problem smaller and easier to train. Example:

Max pooling layer, size 2, stride 2
Input: 3 5 7 6 3 4
Output: 5 7 4
Further explanation: Max(3,5) = 5, Max(7,6) = 7, Max(3,4) = 4

Activation Function
Neural networks achieve their power by introducing non-linearities into the system. Otherwise, networks just become big linear algebra problems, and there is no point in having many layers. In days past, the sigmoid used to be most common, however, recent breakthroughs have indicated that ReLU is a much better operator for deep neural networks. Basically, it is just ‘y = max(0,x)’. So if x is negative, y is 0, otherwise, y is equal to x. Example:

Input: 4 6 2 -4
Output: 4 6 2 0

————————————————————————–

## Awesome Example for TemporalConvolution

————————————————————–

First things first, be sure to include the neural network package.

-- First, be sure to require the 'nn' package for the neural network functions
require 'nn';

Next, we’ll need to create some training data. Neural networks require many examples in order to train, so we choose to generate 10000 example signals. This number may seem large, but remember that we have 4 randomized components to each wave; Type, height, width, start index. This translates to 2*6*21*6 = 1512 possible permutations. In real life, problems are much more complex.

-- Next, create the training data. We'll use 10000 samples for now
nExamples = 10000

trainset = {}
trainset.data = torch.Tensor(nExamples,64,1):zero() -- Data will be sized as 5000x64x1
trainset.label = torch.Tensor(nExamples):zero()     -- Use one dimensional tensor for label

--The network trainer expects an index metatable
setmetatable(trainset,
{__index = function(t, i)
return {t.data[i], t.label[i]}  -- The trainer is expecting trainset[123] to be {data[123], label[123]}
end}
);

--The network trainer expects a size function
function trainset:size()
return self.data:size(1)
end

function GenerateTrainingSet()

-- Time to prepare the training set with data
-- At random, have data be either a triangular pulse, or a rectangular pulse
-- Have randomness as to when the signal starts, ends, and how high it is
for i=1,nExamples do
curWaveType = math.random(1,2)      -- 1 for triangular signal, 2 for square pulse
curWaveHeight = math.random(5,10)   -- how high is signal
curWaveWidth = math.random(20,40)   -- how wide is signal
curWaveStart = math.random(5,10)    -- when to start signal

for j=1,curWaveStart-1 do
trainset.data[i][j][1] = 0
end

if curWaveType==1 then   -- We are making a triangular wave
delta = curWaveHeight / (curWaveWidth/2);
for curIndex=1,curWaveWidth/2 do
trainset.data[i][curWaveStart-1+curIndex][1] = delta * curIndex
end
for curIndex=(curWaveWidth/2)+1, curWaveWidth do
trainset.data[i][curWaveStart-1+curIndex][1] = delta * (curWaveWidth-curIndex)
end
trainset.label[i] = 1
else
for j=1,curWaveWidth do
trainset.data[i][curWaveStart-1+j][1] = curWaveHeight
end
trainset.label[i] = 2
end
end
end

GenerateTrainingSet()


Next, we will construct our neural network. Starting with 64×1 data going in, we will go two Convolution-MaxPool-ReLU ‘layers’, and end with a two layer fully connected neural network, and end with two outputs. Because this is a classification problem, we’ll use log-probability output. Whichever output is greatest (close to zero) is the selection of the network. The other output should have a negative value.

-- This is where we build the model
model = nn.Sequential()                       -- Create network

-- First convolution, using ten, 7-element kernels
model:add(nn.TemporalConvolution(1, 10, 7))   -- 64x1 goes in, 58x10 goes out
model:add(nn.TemporalMaxPooling(2))           -- 58x10 goes in, 29x10 goes out

-- Second convolution, using 5, 7-element kernels
model:add(nn.TemporalConvolution(10, 5, 7))   -- 29x10 goes in, 23x5 goes out
model:add(nn.TemporalMaxPooling(2))           -- 23x5 goes in, 11x5 goes out

-- After convolutional layers, time to do fully connected network
model:add(nn.View(11*5))                        -- Reshape network into 1D tensor

model:add(nn.Linear(11*5, 30))                  -- Fully connected layer, 55 inputs, 30 outputs

model:add(nn.Linear(30, 2))                     -- Final layer has 2 outputs. One for triangle wave, one for square
model:add(nn.LogSoftMax())                      -- log-probability output, since this is a classification problem

With torch, we can see the dimensions of a tensor by applying a ‘#’ before it. So at any time when constructing the network, you can create a partially complete network, and propagate a blank tensor through it and see what the dimension of the last layer is.

-- When building the network, we can test the shape of the output by sending in a dummy tensor
#model:forward(torch.Tensor(64,1))

Next, we set our criteria to nn.ClassNLLCriterion, which is helpful for classification problems. Next, we create a trainer using the StochasticGradient descent algorithm, and set the learning rate and number of iterations. If the learning rate is too high, the network will not converge. If it is too low, the network will converge too slowly. So it takes practice to get this just right.

criterion = nn.ClassNLLCriterion()
trainer.learningRate = 0.01
trainer.maxIteration = 200 -- do 200 epochs of training


Finally, we train our model! Go grab a cup of coffee, it may take a while. Later we will focus on accelerating these training sessions with the GPU, but our network is so small right now that it isn’t practical to accelerate.

trainer:train(trainset)

We can see what an example output and label are below.

-- Lets see an example output
model:forward(trainset.data[123])

-- Lets see which label that is
trainset.label[123]

Let’s figure out how many of the examples are predicted correctly.

function TestTrainset()
correct = 0
for i=1,nExamples do
local groundtruth = trainset.label[i]
local prediction = model:forward(trainset.data[i])
local confidences, indices = torch.sort(prediction, true)  -- sort in descending order
if groundtruth == indices[1] then
correct = correct + 1
else
--print("Incorrect! "..tostring(i))
end
end
print(tostring(correct))
end

-- Lets see how many out of the 10000 samples we predict correctly!
TestTrainset()

Hopefully, that number should read 10,000. Next, let’s be sure our network is really trained well. Let us generate new training sets, and test them. Hopefully, everything will be 10,000, but if there are some incorrect examples, go back and train some more. In real life, we can suffer from a phenomenon called over-training where the model is over-fit to our training data, but we will cover this in a later article. Try to train your network until it passes everything you can throw at it.

-- Generate a new set of data, and test it
for i=1,10 do
GenerateTrainingSet()
TestTrainset()
end

Great, you’ve done it! Now, lets try to gain some understanding into what’s going on here. We created two convolutional layers, the first having ten 1×7 kernels, and the second convolutional layer having five, 10×7 kernels. The reason I use itorch instead of the command line torch interface is so I can easily inspect graphics. Let’s take a look at the filter in the first convolutional layer. We can see that each row is a filter.

require 'image'
itorch.image(model.modules[1].weight)

We can also see which neurons activate the most. You can propagate any input through the network with the :forward function, as demonstrated earlier. Then, we can visualize the outputs of the ReLU (or any) layers. For example, here is the output of the first ReLu layer. It is obvious that some filters are activating more than others.

itorch.image(model.modules[3].output)

Next, lets take a look at the next ReLu layer output. Here we can see that the neurons in the 5th layer are by far the most active for this input. So we know that even if our filters look a little chaotic, neurons in a particular layer do activate and stand out. Finally, these values are sent to the fully connected neural network, which makes sense of what it means when different filters are activated in relation to other filters.

itorch.image(model.modules[6].output)

Now that we understand how different filters activate with certain inputs, let us introduce noise into system and see how the neural network deals with this.

function IntroduceNoise()
for i=1,nExamples do
for j=1,64 do
trainset.data[i][j] = trainset.data[i][j] + torch.normal(0,.25);
end
end
end

-- Generate a new set of data, and test it
for i=1,10 do
GenerateTrainingSet()
IntroduceNoise()
TestTrainset()
end

After training my network around 600 epochs, I was able to achieve 100% perfect signal categorization with the noisy inputs, even though I only trained on the noiseless inputs. Wow! This shows us that the network does indeed work, and is powerful enough to filter out noise which happens in real life data. Next, we will be ready for more interesting challenges!

-- To see the network's structure and variables
model.modules

# Thanks to this site:

http://supercomputingblog.com/machinelearning/an-intro-to-convolutional-networks-in-torch/

# Ref:

http://staff.ustc.edu.cn/~cheneh/paper_pdf/2014/Yi-Zheng-WAIM2014.pdf

# Convolution Tut: Temporal and Spatial

http://torch.ch/torch3/matos/convolutions.pdf

lookupTableLayer = nn.LookupTable(vector:size()[1], d)
for i=1,vector:size()[1] do
lookupTableLayer.weight[i] = vector[i]
end
mlp=nn.Sequential();


Now, to train the network, I loop through every training example and for every example I call gradUpdate() which has this code (this is straight from the examples):

function gradUpdate(mlp, x, indexY, learningRate)
local pred = mlp:forward(x)
mlp:updateParameters(learningRate)
end



https://github.com/ganeshjawahar/torch-teacher/blob/master/stanford/model_nngraph.lua

 — encode the question
 local question_word_vectors = nn.LookupTableMaskZero(#params.dataset.index2word, params.dim)(question):annotate{name = ‘question_word_lookup‘}
 local question_encoder = nn.BiSequencer(nn.GRU(params.dim, params.hid_size, nil, 0), nn.GRU(params.dim, params.hid_size, nil, 0):sharedClone(), nn.JoinTable(1, 1))
 (nn.SplitTable(1, 2)(question_word_vectors)):annotate{name = ‘question_encoder‘}
 local final_q_out = nn.Dropout(params.dropout)(nn.Unsqueeze(3)(nn.SelectTable(–1)(question_encoder))) — get the last step output
 — encode the passage
 local passage_word_vectors = nn.LookupTableMaskZero(#params.dataset.index2word, params.dim)(passage):annotate{name = ‘passage_word_lookup‘}
 local passage_encoder = nn.BiSequencer(nn.GRU(params.dim, params.hid_size, nil, 0), nn.GRU(params.dim, params.hid_size, nil, 0):sharedClone(), nn.JoinTable(1, 1))
 (nn.SplitTable(1, 2)(passage_word_vectors)):annotate{name = ‘passage_encoder‘}
 local final_p_out = nn.Dropout(params.dropout)(nn.View(params.bsize, –1, 2 * params.hid_size)
(nn.JoinTable(2)(passage_encoder))) -- combine the forward and backward rnns' output

l = nn.LookupTableMaskZero(3, 1)
print(l:forward(torch.LongTensor{1}))
print(l:forward(torch.LongTensor{0}))


https://github.com/chapternewscu/image-captioning-with-semantic-attention/blob/master/test_attention_weights_criterion.lua

 function Seq2Seq:buildModel()
 self.encoder = nn.Sequential()
 self.decoder = nn.Sequential()
end
Ref: https://github.com/Element-Research/rnn/issues/155

-- Encoder
local enc = nn.Sequential()
enc:add(nn.SplitTable(1, 2)) -- works for both online and mini-batch mode

-- Decoder
local dec = nn.Sequential()
dec:add(nn.SplitTable(1, 2)) -- works for both online and mini-batch mode

implemented the model using Nicholas Leonard’s rnn package (https://github.com/Element-Research/rnn) as follows:
model = nn.Sequential()

nn.Sequencer(
nn.Sequential()
), 1)

criterion = nn.SequencerCriterion(nn.ClassNLLCriterion()) — not using SequencerCriterion as we only use the last output




function
newModelBuild(dictionarySize,nbfeatures,embeddingSize,rhoInput,rhoOutput,lktype,logsoftFlag)
local model=nn.Sequential()
local p=nn.ParallelTable()
p:add(nn.Identity()) --  -> carries the tensor of features
local lkt=nn.LookupTable(dictionarySize, embeddingSize)
local weightmatrix
if lktype == 0 then
weightmatrix=torch.Tensor(dictionarySize,embeddingSize)
for i=1,dictionarySize do
for j=1,embeddingSize do
weightmatrix[i][j]=torch.uniform(0,1)
end
end
lkt.weight:copy(weightmatrix)
else
lkt.weight:fill(1.0/embeddingSize)
end
local SliceList=nn.ConcatTable() -- purpose: create a list tensor created by joining   tensors
for i=1, rhoInput do
local Slice =nn.Sequential()
local cc=nn.ConcatTable()   -- contains the 2 tensors to join
local a=nn.Sequential()
a:add(nn.SelectTable(2)) -- we select list of tensor(i)
a:add(nn.SelectTable(i))  -- we select a tensor(i)
local b=nn.Sequential()
Slice:add(nn.JoinTable(2)) -- we create a single tensor = tensorF & tensor(i)
end
for i=rhoInput+1,rhoOutput do
local Slice =nn.Sequential()
local cc=nn.ConcatTable()   -- contains the 2 tensors to join
local a=nn.Sequential()
a:add(nn.SelectTable(2)) -- we select list of tensor(i)
local b=nn.Sequential()
Slice:add(nn.JoinTable(2)) -- we create a single tensor = tensorF & tensor(i)
end
return model

lookuptable

REF:

———————————————————-
want to perform zero padding before TemporalConvolution (after lookup table) in order to make sure that the input size is not less that the convolution window size. Here is my network:
this problem was solved by using LookupTableMaskZero from rnn.
ref

————————————————————————–

LRCN

https://github.com/garythung/torch-lrcn

lookup table

http://torch5.sourceforge.net/manual/nn/index-2-5-5.html

https://stackoverflow.com/questions/37748421/lstm-on-top-of-cnn

local function create_network()
local x                = nn.Identity()()
local y                = nn.Identity()()
local prev_s           = nn.Identity()()
local i                = {[0] = LookupTable(params.vocab_size,
params.rnn_size)(x)}
local next_s           = {}
local split         = {prev_s:split(2 * params.layers)}
for layer_idx = 1, params.layers do
local prev_c         = split[2 * layer_idx - 1]
local prev_h         = split[2 * layer_idx]
local dropped        = nn.Dropout(params.dropout)(i[layer_idx - 1])
local next_c, next_h = lstm(dropped, prev_c, prev_h)
table.insert(next_s, next_c)
table.insert(next_s, next_h)
i[layer_idx] = next_h
end
local h2y              = nn.Linear(params.rnn_size, params.vocab_size)
local dropped          = nn.Dropout(params.dropout)(i[params.layers])
local pred             = nn.LogSoftMax()(h2y(dropped))
local err              = nn.ClassNLLCriterion()({pred, y})
local module           = nn.gModule({x, y, prev_s},
{err, nn.Identity()(next_s)})
module:getParameters():uniform(-params.init_weight, params.init_weight)
return transfer_data(module)
end



# Misc

1. First find the process id of firefox using the following command in any directory:
pidof firefox
2. Kill firefox process using the following command in any directory:
kill [firefox pid]

Then start firefox again.

Or you can do the same thing in just one command.As don_crissti said:

kill $(pidof firefox) Ref: http://unix.stackexchange.com/questions/78689/fix-firefox-is-already-running-issue-in-linux # Word Embedding ## Nice explantion on Word Enbedding and ## https://devblogs.nvidia.com/parallelforall/understanding-natural-language-deep-neural-networks-using-torch/ Word embeddings are not unique to neural networks; they are common to all word-level neural language models. Embeddings are stored in a simple lookup table (or hash table), that given a word, returns the embedding (which is an array of numbers). Figure 1 (check in ref link) shows an example. Word embeddings are usually initialized to random numbers (and learned during the training phase of the neural network), or initialized from previously trained models over large texts like Wikipedia. ### Feed-forward Convolutional Neural Networks Convolutional Neural Networks (ConvNets), which were covered in a previous Parallel Forall post by Evan Shelhamer, have enjoyed wide success in the last few years in several domains including images, video, audio and natural language processing. When applied to images, ConvNets usually take raw image pixels as input, interleaving convolution layers along with pooling layers with non-linear functions in between, followed by fully connected layers. Similarly, for language processing, ConvNets take the outputs of word embeddings as input, and then apply interleaved convolution and pooling operations, followed by fully connected layers. Figure 2 shows an example ConvNet applied to sentences. Convolutional Neural Networks—and more generally, feed-forward neural networks—do not traditionally have a notion of time or experience unless you explicitly pass samples from the past as input. After they are trained, given an input, they treat it no differently when shown the input the first time or the 100th time. But to tackle some problems, you need to look at past experiences and give a different answer. ### Recurrent Neural Networks (RNN) Convolutional Neural Networks—and more generally, feed-forward neural networks—do not traditionally have a notion of time or experience unless you explicitly pass samples from the past as input. After they are trained, given an input, they treat it no differently when shown the input the first time or the 100th time. But to tackle some problems, you need to look at past experiences and give a different answer. If you send sentences word-by-word into a feed-forward network, asking it to predict the next word, it will do so, but without any notion of the current context. The animation in Figure 3 shows why context is important. Clearly, without context, you can produce sentences that make no sense. You can have context in feed-forward networks, but it is much more natural to add a recurrent connection. A Recurrent neural network has the capability to give itself feedback from past experiences. Apart from all the neurons in the network, it maintains a hidden state that changes as it sees different inputs. This hidden state is analogous to short-term memory. It remembers past experiences and bases its current answer on both the current input as well as past experiences. An illustration is shown in Figure 4(check in ref link ). ### Long Short Term Memory (LSTM) RNNs keep context in their hidden state (which can be seen as memory). However, classical recurrent networks forget context very fast. They take into account very few words from the past while doing prediction. Here is an example of a language modelling problem that requires longer-term memory. I bought an apple … I am eating the _____ The probability of the word “apple” should be much higher than any other edible like “banana” or “spaghetti”, because the previous sentence mentioned that you bought an “apple”. Furthermore, any edible is a much better fit than non-edibles like “car”, or “cat”. Long Short Term Memory (LSTM) [6] units try to address the problem of such long-term dependencies. LSTM has multiple gates that act as a differentiable RAM memory. Access to memory cells is guarded by “read”, “write” and “erase” gates. Information stored in memory cells is available to the LSTM for a much longer time than in a classical RNN, which allows the model to make more context-aware predictions. An LSTM unit is shown in Figure 5. Exactly how LSTM works is unclear, and fully understanding it is a topic of contemporary research. However, it is known that LSTM outperforms conventional RNNs on many tasks. ## Torch + cuDNN + cuBLAS: Implementing ConvNets and Recurrent Nets efficiently Torch is a scientific computing framework with packages for neural networks and optimization (among hundreds of others). It is based on the Lua language, which is similar to javascript and is treated as a wrapper for optimized C/C++ and CUDA code. At the core of Torch is a powerful tensor library similar to Numpy. The Torch tensor library has both CPU and GPU backends. The neural networks package in torch implements modules, which are different kinds of neuron layers, and containers, which can have several modules within them. Modules are like Lego blocks, and can be plugged together to form complicated neural networks. Each module implements a function and its derivative. This makes it easy to calculate the derivative of any neuron in the network with respect to the objective function of the network (via the chain rule). The objective function is simply a mathematical formula to calculate how well a model is doing on the given task. Usually, the smaller the objective, the better the model performs. The following small example of modules shows how to calculate the element-wise Tanh of an input matrix, by creating an nn.Tanh module and passing the input through it. We calculate the derivative with respect to the objective by passing it in the backward direction. input = torch.randn(100) m = nn.Tanh() output = m:forward(input) InputDerivative = m:backward(input, ObjectiveDerivative) Implementing the ConvNet shown in Figure 2 is also very simple with Torch. In this example, we put all the modules into a Sequential container that chains the modules one after the other. nWordsInDictionary = 100000 embeddingSize = 100 sentenceLength = 5 m = nn.Sequential() -- a container that chains modules one after another m:add(nn.LookupTable(nWordsInDictionary, embeddingSize)) m:add(nn.TemporalConvolution(sentenceLength, 150, embeddingSize)) m:add(nn.Max(1)) m:add(nn.Linear(150, 1024)) m:add(nn.HardTanh()) m:add(nn.Linear()) m:cuda() -- transfer the model to GPU This ConvNet has :forward and :backward functions that allow you to train your network (on CPUs or GPUs). Here we transfer it to the GPU by calling m:cuda(). An extension to the nn package is the nngraph package which lets you build arbitrary acyclic graphs of neural networks. nngraph makes it easier to build complicated modules such as the LSTM memory unit, as the following example code demonstrates. local function lstm(i, prev_c, prev_h) local function new_input_sum() local i2h = nn.Linear(params.rnn_size, params.rnn_size) local h2h = nn.Linear(params.rnn_size, params.rnn_size) return nn.CAddTable()({i2h(i), h2h(prev_h)}) end local in_gate = nn.Sigmoid()(new_input_sum()) local forget_gate = nn.Sigmoid()(new_input_sum()) local in_gate2 = nn.Tanh()(new_input_sum()) local next_c = nn.CAddTable()({ nn.CMulTable()({forget_gate, prev_c}), nn.CMulTable()({in_gate, in_gate2}) }) local out_gate = nn.Sigmoid()(new_input_sum()) local next_h = nn.CMulTable()({out_gate, nn.Tanh()(next_c)}) return next_c, next_h end With these few lines of code we can create powerful state-of-the-art neural networks, ready for execution on CPUs or GPUs with good efficiency. cuBLAS, and more recently cuDNN, have accelerated deep learning research quite significantly, and the recent success of deep learning can be partly attributed to these awesome libraries from NVIDIA. [Learn more about cuDNN here!] cuBLAS is automatically used by Torch for performing BLAS operations such as matrix multiplications, and accelerates neural networks significantly compared to CPUs. To use NVIDIA cuDNN in Torch, simply replace the prefix nn. with cudnn.. cuDNN accelerates the training of neural networks compared to Torch’s default CUDA backend (sometimes up to 30%) and is often several orders of magnitude faster than using CPUs. For language modeling, we’ve implemented an RNN-LSTM neural network [9] using Torch. It gives state-of-the-art results on a standard quality metric called perplexity. The full source of this implementation is available here. We compare the training time of the network on an Intel Core i7 2.6 GHZ vs accelerating it on an NVIDIA GeForce GTX 980 GPU. Table 2 shows the training times and GPU speedups for a small RNN and a larger RNN. Table 2: Training times of a state-of-the-art recurrent network with LSTM cells on CPU vs GPU. ## Conventional Neural Network Figure 1: Conventional Neural Network ### 2.1 Lookup Table The idea of distributed representation for symbolic data is one of the most important reasons why the neural network works. It was proposed by Hinton [11] and has been a research hot spot for more than twenty years [1, 6, 21, 16]. Formally, in the Chinese word segmentation task, we have a character dictionary D of size |D|. Unless otherwise specified, the character dictionary is extracted from the training set and unknown characters are mapped to a special symbol that is not used elsewhere. Each character c∈D is represented as a real-valued vector (character embedding) E⁢m⁢b⁢e⁢d⁢(c)∈ℝd where d is the dimensionality of the vector space. The character embeddings are then stacked into a embedding matrix M∈ℝd×|D|. For a character c∈D that has an associated index k, the corresponding character embedding E⁢m⁢b⁢e⁢d⁢(c)∈ℝd is retrieved by the Lookup Table layer as shown in Figure 1:  E⁢m⁢b⁢e⁢d⁢(c)=M⁢ek (1) Here ek∈ℝ|D| is a binary vector which is zero in all positions except at k-th index. The Lookup Table layer can be seen as a simple projection layer where the character embedding for each context character is achieved by table lookup operation according to their indices. The embedding matrix M is initialized with small random numbers and trained by back-propagation. We will analyze in more detail about the effect of character embeddings in Section 4. ### 2.2 Tag Scoring The most common tagging approach is the window approach. The window approach assumes that the tag of a character largely depends on its neighboring characters. Given an input sentence c[1:n], a window of size w slides over the sentence from character c1 to cn. We set w=5 in all experiments. As shown in Figure 1, at position ci,1≤i≤n, the context characters are fed into the Lookup Table layer. The characters exceeding the sentence boundaries are mapped to one of two special symbols, namely “start” and “end” symbols. The character embeddings extracted by the Lookup Table layer are then concatenated into a single vectora∈ℝH1, where H1=w⋅d is the size of Layer 1. Then a is fed into the next layer which performs linear transformation followed by an element-wise activation function g such as tanh, which is used in our experiments:  h=g⁢(W1⁢a+b1) (2) where W1∈ℝH2×H1, b1∈ℝH2×1, h∈ℝH2. H2 is a hyper-parameter which is the number of hidden units in Layer 2. Given a set of tags T of size |T|, a similar linear transformation is performed except that no non-linear function is followed:  f(t|c[i-2:i+2])=W2h+b2 (3) where W2∈ℝ|T|×H2, b2∈ℝ|T|×1. f(t|c[i-2:i+2])∈ℝ|T| is the score vector for each possible tag. In Chinese word segmentation, the most prevalent tag set T is BMES tag set, which uses 4 tags to carry word boundary information. It uses B, M, E and S to denote the Beginning, the Middle, the End of a word and a Single character forming a word respectively. We use this tag set in our method. ### 2.3 Model Training and Inference Despite sharing commonalities mentioned above, previous work models the segmentation task differently and therefore uses different training and inference procedure. Mansur et al. [15] modeled Chinese word segmentation as a series of classification task at each position of the sentence in which the tag score is transformed into probability using softmax function:  p(ti|c[i-2:i+2])=exp(f(ti|c[i-2:i+2]))∑t′exp(f(t′|c[i-2:i+2])) The model is then trained in MLE-style which maximizes the log-likelihood of the tagged data. Obviously, it is a local model which cannot capture the dependency between tags and does not support to infer the tag sequence globally. To model the tag dependency, previous neural network models [6, 35] introduce a transition score Ai⁢j for jumping from tag i∈T to tag j∈T. For a input sentence c[1:n] with a tag sequence t[1:n], a sentence-level score is then given by the sum of transition and network scores:  s(c[1:n],t[1:n],θ)=∑i=1n(Ati-1⁢ti+fθ(ti|c[i-2:i+2])) (4) where fθ(ti|c[i-2:i+2]) indicates the score output for tag ti at the i-th character by the network with parameters θ=(M,A,W1,b1,W2,b2). Given the sentence-level score, Zheng et al. [35] proposed a perceptron-style training algorithm inspired by the work of Collins [5]. Compared with Mansur et al. [15], their model is a global one where the training and inference is performed at sentence-level. Workable as these methods seem, one of the limitations of them is that the tag-tag interaction and the neural network are modeled seperately. The simple tag-tag transition neglects the impact of context characters and thus limits the ability to capture flexible interactions between tags and context characters. Moreover, the simple non-linear transformation in equation (2) is also poor to model the complex interactional effects in Chinese word segmentation. Ref: https://www.aclweb.org/anthology/P/P14/P14-1028.xhtml require "rnn" require "cunn" torch.manualSeed(123) batch_size= 2 maxLen = 4 wordVec = 5 nWords = 100 mode = 'CPU' -- create random data with zeros as empty indicator inp1 = torch.ceil(torch.rand(batch_size, maxLen)*nWords) -- labels = torch.ceil(torch.rand(batch_size)*2) -- create labels of 1s and 2s -- not all sequences have the same lenght, 0 placeholder for i=1, batch_size do n_zeros = torch.random(maxLen-2) inp1[{{i},{1, n_zeros}}] = torch.zeros(n_zeros) end -- make the first sequence the same as the second inp1[{{2},{}}] = inp1[{{1},{}}]:clone() lstm = nn.Sequential() lstm:add(nn.LookupTableMaskZero(10000, wordVec, batch_size)) -- convert indices to word vectors lstm:add(nn.SplitTable(1)) -- convert tensor to list of subtensors lstm:add(nn.Sequencer(nn.MaskZero(nn.LSTM(wordVec, wordVec), 1))) -- Seq to Seq', 0-Seq to 0-Seq if mode == 'GPU' then lstm:cuda() criterion:cuda() labels = labels:cuda() inp1 = inp1:cuda() end out = lstm:forward(inp1) print('input 1', inp1[1]) print('lstm out 1', out[1]) print('input 2', inp1[2]) -- shoudl be the same as above print('lstm out 2', out[2]) -- should be the same as above REF http://cseweb.ucsd.edu/~dasgupta/254-deep/stefanos.pdf Natural language understanding (almost) from scratch http://resola.ai/dev/ https://iksinc.wordpress.com/tag/continuous-bag-of-words-cbow/ The final layer of the network has one node for each candidate tag, each output is interpreted as the score for the associated tag. What is a word vector? At one level, it’s simply a vector of weights. In a simple 1-of-N (or ‘one-hot’) encoding every element in the vector is associated with a word in the vocabulary. The encoding of a given word is simply the vector in which the corresponding element is set to one, and all other elements are zero. Suppose our vocabulary has only five words: King, Queen, Man, Woman, and Child. We could encode the word ‘Queen’ as: Using such an encoding, there’s no meaningful comparison we can make between word vectors other than equality testing. In word2vec, a distributed representation of a word is used. Take a vector with several hundred dimensions (say 1000). Each word is representated by a distribution of weights across those elements. So instead of a one-to-one mapping between an element in the vector and a word, the representation of a word is spread across all of the elements in the vector, and each element in the vector contributes to the definition of many words. If I label the dimensions in a hypothetical word vector (there are no such pre-assigned labels in the algorithm of course), it might look a bit like this: Such a vector comes to represent in some abstract way the ‘meaning’ of a word. And as we’ll see next, simply by examining a large corpus it’s possible to learn word vectors that are able to capture the relationships between words in a surprisingly expressive way. We can also use the vectors as inputs to a neural network. ### Reasoning with word vectors We find that the learned word representations in fact capture meaningful syntactic and semantic regularities in a very simple way. Specifically, the regularities are observed as constant vector offsets between pairs of words sharing a particular relationship. For example, if we denote the vector for word i as xi, and focus on the singular/plural relation, we observe that xapple – xapples ≈ xcar– xcars, xfamily – xfamilies ≈ xcar – xcars, and so on. Perhaps more surprisingly, we find that this is also the case for a variety of semantic relations, as measured by the SemEval 2012 task of measuring relation similarity. The vectors are very good at answering analogy questions of the form a is to b as cis to ?. For example, man is to woman as uncle is to ? (aunt) using a simple vector offset method based on cosine distance. For example, here are vector offsets for three word pairs illustrating the gender relation: Ref The amazing power of word vectors # Word Embedding Code In torch lookuptable self.llstm = LSTM self.rlstm = LSTM local modules = nn.Parallel() :add(nn.LookupTable(self.vocab_size, self.emb_size)) :add(nn.Collapse(2)) :add(self.llstm) :add(self.my_module) self.params, self.grad_params = modules:getParameters  ref http://stackoverflow.com/questions/37126328/how-to-use-nn-lookuptable-in-torch   # Multiple batches LSTM ref https://github.com/Element-Research/rnn/issues/74  require "rnn" require "cunn" torch.manualSeed(123) batch_size= 2 maxLen = 4 wordVec = 5 nWords = 100 mode = 'CPU' -- create random data with zeros as empty indicator inp1 = torch.ceil(torch.rand(batch_size, maxLen)*nWords) -- labels = torch.ceil(torch.rand(batch_size)*2) -- create labels of 1s and 2s -- not all sequences have the same lenght, 0 placeholder for i=1, batch_size do n_zeros = torch.random(maxLen-2) inp1[{{i},{1, n_zeros}}] = torch.zeros(n_zeros) end -- make the first sequence the same as the second inp1[{{2},{}}] = inp1[{{1},{}}]:clone() lstm = nn.Sequential() lstm:add(nn.LookupTableMaskZero(10000, wordVec, batch_size)) -- convert indices to word vectors lstm:add(nn.SplitTable(1)) -- convert tensor to list of subtensors lstm:add(nn.Sequencer(nn.MaskZero(nn.LSTM(wordVec, wordVec), 1))) -- Seq to Seq', 0-Seq to 0-Seq if mode == 'GPU' then lstm:cuda() criterion:cuda() labels = labels:cuda() inp1 = inp1:cuda() end out = lstm:forward(inp1) print('input 1', inp1[1]) print('lstm out 1', out[1]) print('input 2', inp1[2]) -- shoudl be the same as above print('lstm out 2', out[2]) -- should be the same as above # sequence-to-sequence networks.  --[[ Example of "coupled" separate encoder and decoder networks, e.g. -- for sequence-to-sequence networks. ]]--  require 'rnn' version = 1.2 -- refactored numerical gradient test into unit tests. Added training loop local opt = {} opt.learningRate = 0.1 opt.hiddenSize = 6 opt.vocabSize = 5 opt.seqLen = 3 -- length of the encoded sequence opt.niter = 1000 --[[ Forward coupling: Copy encoder cell and output to decoder LSTM ]]-- local function forwardConnect(encLSTM, decLSTM,seqLen) decLSTM.userPrevOutput = nn.rnn.recursiveCopy(decLSTM.userPrevOutput, encLSTM.outputs[seqLen]) decLSTM.userPrevCell = nn.rnn.recursiveCopy(decLSTM.userPrevCell, encLSTM.cells[seqLen]) end --[[ Backward coupling: Copy decoder gradients to encoder LSTM ]]-- local function backwardConnect(encLSTM, decLSTM) encLSTM.userNextGradCell = nn.rnn.recursiveCopy(encLSTM.userNextGradCell, decLSTM.userGradPrevCell) encLSTM.gradPrevOutput = nn.rnn.recursiveCopy(encLSTM.gradPrevOutput, decLSTM.userGradPrevOutput) end -- Encoder local enc = nn.Sequential() enc:add(nn.LookupTableMaskZero(opt.vocabSize, opt.hiddenSize)) enc:add(nn.SplitTable(1, 2)) -- works for both online and mini-batch mode local encLSTM = nn.LSTM(opt.hiddenSize, opt.hiddenSize):maskZero(1) enc:add(nn.Sequencer(encLSTM)) enc:add(nn.SelectTable(-1)) -- Decoder local dec = nn.Sequential() dec:add(nn.LookupTableMaskZero(opt.vocabSize, opt.hiddenSize)) dec:add(nn.SplitTable(1, 2)) -- works for both online and mini-batch mode local decLSTM = nn.LSTM(opt.hiddenSize, opt.hiddenSize):maskZero(1) dec:add(nn.Sequencer(decLSTM)) dec:add(nn.Sequencer(nn.MaskZero(nn.Linear(opt.hiddenSize, opt.vocabSize),1))) dec:add(nn.Sequencer(nn.MaskZero(nn.LogSoftMax(),1))) -- dec = nn.MaskZero(dec,1) local criterion = nn.SequencerCriterion(nn.MaskZeroCriterion(nn.ClassNLLCriterion(),1)) -- Some example data (batchsize = 2) local encInSeq1 = torch.Tensor({{1,2,3},{3,2,1}}) local decInSeq1 = torch.Tensor({{1,2,3,4},{2,4,3,1}}) local decOutSeq1 = torch.Tensor({{2,3,4,1},{4,3,1,2}}) decOutSeq1 = nn.SplitTable(1, 1):forward(decOutSeq1) local encInSeq = torch.Tensor({{1,1,1,2,3},{0,0,1,2,3},{0,0,3,2,1}}) local decInSeq = torch.Tensor({{1,1,1,1,2,3},{1,2,3,4,0,0},{2,4,3,1,0,0}}) local decOutSeq = torch.Tensor({{1,1,1,2,3,2},{2,3,4,1,0,0},{4,3,1,2,0,0}}) decOutSeq = nn.SplitTable(1, 1):forward(decOutSeq) print(decOutSeq) print('encoder:') for i,module in ipairs(enc:listModules()) do print(module) break end print('decoder:') for i,module in ipairs(dec:listModules()) do print(module) break end local function train(i,encInSeq, decInSeq,decOutSeq) -- Forward pass local len = encInSeq:size(2) -- print(len) local encOut = enc:forward(encInSeq) forwardConnect(encLSTM, decLSTM,len) local decOut = dec:forward(decInSeq) -- print("decout:") -- for i = 1,#decOut do -- print(decOut[i]) -- end local err = criterion:forward(decOut, decOutSeq) -- print(err) print(string.format("Iteration %d ; NLL err = %f ", i, err)) -- Backward pass local gradOutput = criterion:backward(decOut, decOutSeq) dec:backward(decInSeq, gradOutput) backwardConnect(encLSTM, decLSTM) local zeroTensor = torch.Tensor(2):zero() enc:backward(encInSeq, zeroTensor) dec:updateParameters(opt.learningRate) enc:updateParameters(opt.learningRate) enc:zeroGradParameters() dec:zeroGradParameters() dec:forget() enc:forget() encLSTM:recycle() decLSTM:recycle() end for i=1,1000 do train(i,encInSeq,decInSeq,decOutSeq) -- train(i,encInSeq1,decInSeq1,decOutSeq1) end Returns a new Tensor which is a narrowed version of the current one: the dimension dim is narrowed from index to index+size-1. > x = torch.Tensor(5, 6):zero() > print(x) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [torch.Tensor of dimension 5x6] > y = x:narrow(1, 2, 3) -- narrow dimension 1 from index 2 to index 2+3-1 > y:fill(1) -- fill with 1 > print(y) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [torch.Tensor of dimension 3x6] > print(x) -- memory in x has been modified! 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 [torch.Tensor of dimension 5x6] # Class https://github.com/torch/torch7/blob/master/doc/utility.md [metatable] torch.class(name, [parentName], [module]) https://github.com/torch/class ## Object Classes for Lua This package provide simple object-oriented capabilities to Lua. Each class is defined with a metatable, which contains methods. Inheritance is achieved by setting metatables over metatables. An efficient type checking is provided. ## Typical Example local class = require 'class' -- define some dummy A class local A = class('A') function A:__init(stuff) self.stuff = stuff end function A:run() print(self.stuff) end -- define some dummy B class, inheriting from A local B = class('B', 'A') function B:__init(stuff) A.__init(self, stuff) -- call the parent init end function B:run5() for i=1,5 do print(self.stuff) end end -- create some instances of both classes local a = A('hello world from A') local b = B('hello world from B') -- run stuff a:run() b:run() b:run5() ## Documentation First, require the package local class = require 'class' Note that class does not clutter the global namespace. Class metatables are then created with class(name) or equivalently class.new(name). local A = class('A') local B = class('B', 'A') -- B inherit from A You then have to fill-up the returned metatable with methods. function A:myMethod() -- do something end —————————————— Creates a new Torch class called name. If parentName is provided, the class will inherit parentName methods. A class is a table which has a particular metatable. If module is not provided and if name is of the form package.className then the class className will be added to the specified package. In that case, package has to be a valid (and already loaded) package. If name does not contain any ., then the class will be defined in the global environment. If module is provided table, the class will be defined in this table at keyclassName. One [or two] (meta)tables are returned. These tables contain all the method provided by the class [and its parent class if it has been provided]. After a call to torch.class() you have to fill-up properly the metatable. After the class definition is complete, constructing a new class name will be achieved by a call to name(). This call will first call the method lua__init() if it exists, passing all arguments of name(). -- for naming convenience do --- creates a class "Foo" local Foo = torch.class('Foo') --- the initializer function Foo:__init() self.contents = 'this is some text' end --- a method function Foo:print() print(self.contents) end --- another one function Foo:bip() print('bip') end end --- now create an instance of Foo foo = Foo() --- try it out foo:print() --- create a class torch.Bar which --- inherits from Foo do local Bar, parent = torch.class('torch.Bar', 'Foo') --- the initializer function Bar:__init(stuff) --- call the parent initializer on ourself parent.__init(self) --- do some stuff self.stuff = stuff end --- a new method function Bar:boing() print('boing!') end --- override parent's method function Bar:print() print(self.contents) print(self.stuff) end end --- create a new instance and use it bar = torch.Bar('ha ha!') bar:print() -- overrided method bar:boing() -- child method bar:bip() -- parent's method Narrow https://github.com/torch/torch7/blob/master/doc/tensor.md http://jucor.github.io/torch-doc-template/tensor.html#toc_33 http://torch7.readthedocs.io/en/rtd/maths/ https://github.com/torch/torch7/blob/master/doc/storage.md  # Attention Model for CNN # Attention Model for RNN https://github.com/harvardnlp/seq2seq-attn/blob/master/s2sa/models.lua ## Imp blog Attention and Memory in Deep Learning and NLP # NNgraph https://github.com/torch/nngraph ### A network with containers Another net that uses container modules (like ParallelTable) that output a table of outputs. m = nn.Sequential() m:add(nn.SplitTable(1)) m:add(nn.ParallelTable():add(nn.Linear(10, 20)):add(nn.Linear(10, 30))) input = nn.Identity()() input1, input2 = m(input):split(2) m3 = nn.JoinTable(1)({input1, input2}) g = nn.gModule({input}, {m3}) indata = torch.rand(2, 10) gdata = torch.rand(50) g:forward(indata) g:backward(indata, gdata) graph.dot(g.fg, 'Forward Graph') graph.dot(g.bg, 'Backward Graph') Tensor http://jucor.github.io/torch-doc-template/tensor.html # LSTM http://kbullaughey.github.io/lstm-play/lstm/ # Torch Tips # ================================= # python -m SimpleHTTPServer https://your-ip:8888 — this will work like network ======================================================== –list of screen screen ls –resume the screen screen -r 18497.new_vision –detach screen cntl+A D ============================================================= # how to convert a table to tensor in torch ### torch.Tensor(table) The argument is assumed to be a Lua array of numbers. The constructor returns a new Tensor of the size of the table, containing all the table elements. The table might be multi-dimensional. Example: > torch.Tensor({{1,2,3,4}, {5,6,7,8}}) 1 2 3 4 5 6 7 8 [torch.DoubleTensor of dimension 2x4] p={0.3148, 0.3574, 0.3829, 0.3967, 0.4062, 0.4180, 0.4208, 0.4267, 0.4312, 0.4329, } –torch.Tensor(table) –p is table and x is tensor. all the operation in torch are in tensor, not on table. x=torch.Tensor(p) print(x) print(‘p’,p) y=torch.Tensor(q) q={0.2603, 0.3541, 0.3874, 0.4088, 0.4232, 0.4330, 0.4404, 0.4479, 0.4549, 0.4608, 0.4631, 0.4693, 0.4740, 0.4822, } print(y) x=torch.mul(x,10000) — x should be tensor, if x is table then i will give error y=torch.mul(y,10000) print(y) print(x) ### ————————- ### [res] torch.mul([res,] tensor1, value) Multiply all elements in the Tensor by the given value. z = torch.mul(x, 2) will return a new Tensor with the result of x * 2. torch.mul(z, x, 2) will put the result of x * 2 in z. x:mul(2) will multiply all elements of x with 2 in-place. z:mul(x, 2) will put the result of x * 2 in z. ref https://github.com/torch/torch7/blob/master/doc/maths.md https://github.com/torch/torch7/blob/master/doc/tensor.md for math operation https://github.com/torch/torch7/blob/master/doc/maths.md ————————————————– for adding table https://github.com/torch/nn/blob/master/doc/table.md#nn.CAddTable —————————————————– to plot graph https://github.com/torch/optim/blob/master/doc/logger.md  –[[ Logger: a simple class to log symbols during training, and automate plot generation Example: logger = optim.Logger(‘somefile.log’) — file to save stuff for i = 1,N do — log some symbols during train_error = … — training/testing test_error = … logger:add{[‘training error’] = train_error, [‘test error’] = test_error} end logger:style{[‘training error’] = ‘-‘, — define styles for plots [‘test error’] = ‘-‘} logger:plot() — and plot —- OR — logger = optim.Logger(‘somefile.log’) — file to save stuff logger:setNames{‘training error’, ‘test error’} for i = 1,N do — log some symbols during train_error = … — training/testing test_error = … logger:add{train_error, test_error} end logger:style{‘-‘, ‘-‘} — define styles for plots logger:plot() — and plot ———– logger:setlogscale(true) — enable logscale on Y-axis logger:plot() — and plot ]] ------------------------------------- ## JoinTable module = JoinTable(dimension, nInputDims) Creates a module that takes a table of Tensors as input and outputs a Tensor by joining them together along dimensiondimension. In the diagram below dimension is set to 1. +----------+ +-----------+ | {input1, +-------------> output[1] | | | +-----------+-+ | input2, +-----------> output[2] | | | +-----------+-+ | input3} +---------> output[3] | +----------+ +-----------+  The optional parameter nInputDims allows to specify the number of dimensions that this module will receive. This makes it possible to forward both minibatch and non-minibatch Tensors through the same module. ### Example 1 x = torch.randn(5, 1) y = torch.randn(5, 1) z = torch.randn(2, 1) print(nn.JoinTable(1):forward{x, y}) print(nn.JoinTable(2):forward{x, y}) print(nn.JoinTable(1):forward{x, z}) gives the output:  1.3965 0.5146 -1.5244 -0.9540 0.4256 0.1575 0.4491 0.6580 0.1784 -1.7362 [torch.DoubleTensor of dimension 10x1] 1.3965 0.1575 0.5146 0.4491 -1.5244 0.6580 -0.9540 0.1784 0.4256 -1.7362 [torch.DoubleTensor of dimension 5x2] 1.3965 0.5146 -1.5244 -0.9540 0.4256 -1.2660 1.0869 [torch.Tensor of dimension 7x1] ### Example 2 module = nn.JoinTable(2, 2) x = torch.randn(3, 1) y = torch.randn(3, 1) mx = torch.randn(2, 3, 1) my = torch.randn(2, 3, 1) print(module:forward{x, y}) print(module:forward{mx, my}) gives the output:  0.4288 1.2002 -1.4084 -0.7960 -0.2091 0.1852 [torch.DoubleTensor of dimension 3x2] (1,.,.) = 0.5561 0.1228 -0.6792 0.1153 0.0687 0.2955 (2,.,.) = 2.5787 1.8185 -0.9860 0.6756 0.1989 -0.4327 [torch.DoubleTensor of dimension 2x3x2] ### A more complicated example mlp = nn.Sequential() -- Create a network that takes a Tensor as input c = nn.ConcatTable() -- The same Tensor goes through two different Linear c:add(nn.Linear(10, 3)) -- Layers in Parallel c:add(nn.Linear(10, 7)) mlp:add(c) -- Outputing a table with 2 elements p = nn.ParallelTable() -- These tables go through two more linear layers p:add(nn.Linear(3, 2)) -- separately. p:add(nn.Linear(7, 1)) mlp:add(p) mlp:add(nn.JoinTable(1)) -- Finally, the tables are joined together and output. pred = mlp:forward(torch.randn(10)) print(pred) for i = 1, 100 do -- A few steps of training such a network.. x = torch.ones(10) y = torch.Tensor(3); y:copy(x:narrow(1, 1, 3)) pred = mlp:forward(x) criterion= nn.MSECriterion() local err = criterion:forward(pred, y) local gradCriterion = criterion:backward(pred, y) mlp:zeroGradParameters() mlp:backward(x, gradCriterion) mlp:updateParameters(0.05) print(err) end Ref: https://github.com/torch/nn/blob/master/doc/table.md#nn.JoinTable ## Concat module = nn.Concat(dim) Concat concatenates the output of one layer of “parallel” modules along the provided dimension dim: they take the same inputs, and their output is concatenated. mlp=nn.Concat(1); mlp:add(nn.Linear(5,3)) mlp:add(nn.Linear(5,7)) print(mlp:forward(torch.randn(5))) which gives the output:  0.7486 0.1349 0.7924 -0.0371 -0.4794 0.3044 -0.0835 -0.7928 0.7856 -0.1815 [torch.Tensor of dimension 10] ### [res] torch.cat( [res,] {x_1, x_2, …}, [dimension] ) x = torch.cat(x_1, x_2, [dimension]) returns a Tensor x which is the concatenation of Tensors x_1 and x_2along dimension dimension. If dimension is not specified it is the last dimension. The other dimensions of x_1 and x_2 have to be equal. Also supports arrays with arbitrary numbers of Tensors as inputs. Examples: > torch.cat(torch.ones(3), torch.zeros(2)) 1 1 1 0 0 [torch.DoubleTensor of size 5] > torch.cat(torch.ones(3, 2), torch.zeros(2, 2), 1) 1 1 1 1 1 1 0 0 0 0 [torch.DoubleTensor of size 5x2] > torch.cat(torch.ones(2, 2), torch.zeros(2, 2), 1) 1 1 1 1 0 0 0 0 [torch.DoubleTensor of size 4x2] > torch.cat(torch.ones(2, 2), torch.zeros(2, 2), 2) 1 1 0 0 1 1 0 0 [torch.DoubleTensor of size 2x4] > torch.cat(torch.cat(torch.ones(2, 2), torch.zeros(2, 2), 1), torch.rand(3, 2), 1) 1.0000 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.3227 0.0493 0.9161 0.1086 0.2206 0.7449 [torch.DoubleTensor of size 7x2] > torch.cat({torch.ones(2, 2), torch.zeros(2, 2), torch.rand(3, 2)}, 1) 1.0000 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.3227 0.0493 0.9161 0.1086 0.2206 0.7449 [torch.DoubleTensor of size 7x2] ## ConcatTable module = nn.ConcatTable() ConcatTable is a container module that applies each member module to the same input Tensor or table.  +-----------+ +----> {member1, | +-------+ | | | | input +----+----> member2, | +-------+ | | | or +----> member3} | {input} +-----------+  ### Example 1 mlp = nn.ConcatTable() mlp:add(nn.Linear(5, 2)) mlp:add(nn.Linear(5, 3)) pred = mlp:forward(torch.randn(5)) for i, k in ipairs(pred) do print(i, k) end which gives the output: 1 -0.4073 0.0110 [torch.Tensor of dimension 2] 2 0.0027 -0.0598 -0.1189 [torch.Tensor of dimension 3] ### Example 2 mlp = nn.ConcatTable() mlp:add(nn.Identity()) mlp:add(nn.Identity()) pred = mlp:forward{torch.randn(2), {torch.randn(3)}} print(pred) which gives the output (using th): { 1 : { 1 : DoubleTensor - size: 2 2 : { 1 : DoubleTensor - size: 3 } } 2 : { 1 : DoubleTensor - size: 2 2 : { 1 : DoubleTensor - size: 3 } } } ref: https://github.com/torch/nn/blob/master/doc/containers.md#nn.Concat https://github.com/torch/nn/blob/master/doc/table.md#nn.ConcatTable # ———————————— # Math Library Tutorial The math library is documented in section 6.7 of the Reference Manual.[1] Below is a summary of the functions and variables provided. Each is described, with an example, on this page. math.abs math.acos math.asin math.atan math.ceil math.cos math.deg math.exp math.floor math.fmod math.huge math.log math.max math.maxinteger math.min math.mininteger math.modf math.pi math.rad math.random math.randomseed math.sin math.sqrt math.tan math.tointeger math.type math.ult  ### math.abs Return the absolute, or non-negative value, of a given value. > = math.abs(-100) 100 > = math.abs(25.67) 25.67 > = math.abs(0) 0  ### math.acos , math.asin Return the inverse cosine and sine in radians of the given value. > = math.acos(1) 0 > = math.acos(0) 1.5707963267949 > = math.asin(0) 0 > = math.asin(1) 1.5707963267949  Ref: http://lua-users.org/wiki/MathLibraryTutorial # ———————————— # Neural Network Package This package provides an easy and modular way to build and train simple or complex neural networks using Torch: # —————————— # Ref: https://github.com/torch/nn # ———————————– # Convolutional layers A convolution is an integral that expresses the amount of overlap of one function g as it is shifted over another function f. It therefore “blends” one function with another. The neural network package supports convolution, pooling, subsampling and other relevant facilities. These are divided based on the dimensionality of the input and output Tensors: ——————————————————————- REF: https://github.com/torch/nn/blob/master/doc/convolution.md#nn.convlayers.dok https://github.com/torch/nn/blob/master/doc/training.md#nn.traningneuralnet.dok # ———————————— # Simple layers Simple Modules are used for various tasks like adapting Tensor methods and providing affine transformations : • Parameterized Modules : • Linear : a linear transformation ; • SparseLinear : a linear transformation with sparse inputs ; • Bilinear : a bilinear transformation with sparse inputs ; • PartialLinear : a linear transformation with sparse inputs with the option of only computing a subset ; • Add : adds a bias term to the incoming data ; • Mul : multiply a single scalar factor to the incoming data ; • CMul : a component-wise multiplication to the incoming data ; • Euclidean : the euclidean distance of the input to k mean centers ; • WeightedEuclidean : similar to Euclidean, but additionally learns a diagonal covariance matrix ; • Cosine : the cosine similarity of the input to k mean centers ; • Modules that adapt basic Tensor methods : • Modules that adapt mathematical Tensor methods : • AddConstant : adding a constant ; • MulConstant : multiplying a constant ; • Max : a max operation over a given dimension ; • Min : a min operation over a given dimension ; • Mean : a mean operation over a given dimension ; • Sum : a sum operation over a given dimension ; • Exp : an element-wise exp operation ; • Log : an element-wise log operation ; • Abs : an element-wise abs operation ; • Power : an element-wise pow operation ; • Square : an element-wise square operation ; • Sqrt : an element-wise sqrt operation ; • Clamp : an element-wise clamp operation ; • Normalize : normalizes the input to have unit L_p norm ; • MM : matrix-matrix multiplication (also supports batches of matrices) ; • Miscellaneous Modules : • BatchNormalization : mean/std normalization over the mini-batch inputs (with an optional affine transform) ; • Identity : forward input as-is to output (useful with ParallelTable) ; • Dropout : masks parts of the input using binary samples from a bernoulli distribution ; • SpatialDropout : same as Dropout but for spatial inputs where adjacent pixels are strongly correlated ; • VolumetricDropout : same as Dropout but for volumetric inputs where adjacent voxels are strongly correlated ; • Padding : adds padding to a dimension ; • L1Penalty : adds an L1 penalty to an input (for sparsity) ; • GradientReversal : reverses the gradient (to maximize an objective function) ; • GPU : decorates a module so that it can be executed on a specific GPU device. # Table Layers This set of modules allows the manipulation of tables through the layers of a neural network. This allows one to build very rich architectures: # CmdLine This class provides a parameter parsing framework which is very useful when one needs to run several experiments that rely on different parameter settings that are passed in the command line. This class will also override the default print function to direct all the output to a log file as well as screen at the same time. A sample lua file is given below that makes use of CmdLine class. cmd = torch.CmdLine() cmd:text() cmd:text() cmd:text('Training a simple network') cmd:text() cmd:text('Options') cmd:option('-seed',123,'initial random seed') cmd:option('-booloption',false,'boolean option') cmd:option('-stroption','mystring','string option') cmd:text() -- parse input params params = cmd:parse(arg) params.rundir = cmd:string('experiment', params, {dir=true}) paths.mkdir(params.rundir) -- create log file ## Torch Packages • Tensor Library • Tensor defines the all powerful tensor object that provides multi-dimensional numerical arrays with type templating. • Mathematical operations that are defined for the tensor object types. • Storage defines a simple storage interface that controls the underlying storage for any tensor object. • File I/O Interface Library • Useful Utilities • Timer provides functionality for measuring time. • Tester is a generic tester framework. • CmdLine is a command line argument parsing utility. • Random defines a random number generator package with various distributions. • Finally useful utility functions are provided for easy handling of torch tensor types and class inheritance. • Ref: REF https://github.com/torch/nn/blob/master/doc/simple.md#nn.Select https://github.com/torch/nn/blob/master/doc/table.md Blog : Deeplearning In torch http://rnduja.github.io/2015/10/07/deep_learning_with_torch_step_4_nngraph/ https://github.com/torch/torch7/blob/master/doc/cmdline.md https://github.com/torch/torch7 Deep Learning with Torch REF http://learning.cs.toronto.edu/wp-content/uploads/2015/02/torch_tutorial.pdf http://hunch.net/~nyoml/torch7.pdf http://atamahjoubfar.github.io/Torch_for_Matlab_users.pdf http://ml.informatik.uni-freiburg.de/_media/teaching/ws1415/presentation_dl_lect3.pdf # A Torch autoencoder example ## Extracting features from MNIST digits http://rnduja.github.io/2015/11/06/torch-autoencoder/ # http://rnduja.github.io/2015/10/13/torch-mnist/ # Introduction to nngraph https://github.com/torch/nngraph http://kbullaughey.github.io/lstm-play/2015/09/18/introduction-to-nngraph.html LSTM and Fast LSTM http://www.humphreysheil.com/blog/getting-to-grips-with-lstm-part-one http://christopher5106.github.io/deep/learning/2016/07/14/element-research-torch-rnn-tutorial.html Torch Slide http://hunch.net/~nyoml/torch7.pdf https://moodle.technion.ac.il/mod/forum/discuss.php?d=293691&lang=en https://github.com/torch/torch7/wiki/Cheatsheet ## Module Module is an abstract class which defines fundamental methods necessary for a training a neural network. Modules areserializable. Modules contain two states variables: output and gradInput. ### [output] forward(input) Takes an input object, and computes the corresponding output of the module. In general input and output areTensors. However, some special sub-classes like table layers might expect something else. Please, refer to each module specification for further information. After a forward(), the output state variable should have been updated to the new value. It is not advised to override this function. Instead, one should implement updateOutput(input) function. The forward module in the abstract parent class Module will call updateOutput(input). ### [gradInput] backward(input, gradOutput) Performs a backpropagation step through the module, with respect to the given input. In general this method makes the assumption forward(input) has been called before, with the same input. This is necessary for optimization reasons. If you do not respect this rule, backward() will compute incorrect gradients. In general input and gradOutput and gradInput are Tensors. However, some special sub-classes like table layers might expect something else. Please, refer to each module specification for further information. A backpropagation step consist in computing two kind of gradients at input given gradOutput (gradients with respect to the output of the module). This function simply performs this task using two function calls: It is not advised to override this function call in custom classes. It is better to override updateGradInput(input, gradOutput)and accGradParameters(input, gradOutput,scale) functions. ### updateOutput(input) Computes the output using the current parameter set of the class and input. This function returns the result which is stored in the output field. ### updateGradInput(input, gradOutput) Computing the gradient of the module with respect to its own input. This is returned in gradInput. Also, the gradInput state variable is updated accordingly. ### accGradParameters(input, gradOutput, scale) Computing the gradient of the module with respect to its own parameters. Many modules do not perform this step as they do not have any parameters. The state variable name for the parameters is module dependent. The module is expected toaccumulate the gradients with respect to the parameters in some variable. scale is a scale factor that is multiplied with the gradParameters before being accumulated. Zeroing this accumulation is achieved with zeroGradParameters() and updating the parameters according to this accumulation is done with updateParameters(). ### zeroGradParameters() If the module has parameters, this will zero the accumulation of the gradients with respect to these parameters, accumulated through accGradParameters(input, gradOutput,scale) calls. Otherwise, it does nothing. ### updateParameters(learningRate) If the module has parameters, this will update these parameters, according to the accumulation of the gradients with respect to these parameters, accumulated through backward() calls. The update is basically: parameters = parameters - learningRate * gradients_wrt_parameters If the module does not have parameters, it does nothing. ### accUpdateGradParameters(input, gradOutput, learningRate) This is a convenience module that performs two functions at once. Calculates and accumulates the gradients with respect to the weights after multiplying with negative of the learning rate learningRate. Performing these two operations at once is more performance efficient and it might be advantageous in certain situations. Keep in mind that, this function uses a simple trick to achieve its goal and it might not be valid for a custom module. Also note that compared to accGradParameters(), the gradients are not retained for future use. function Module:accUpdateGradParameters(input, gradOutput, lr) local gradWeight = self.gradWeight local gradBias = self.gradBias self.gradWeight = self.weight self.gradBias = self.bias self:accGradParameters(input, gradOutput, -lr) self.gradWeight = gradWeight self.gradBias = gradBias end As it can be seen, the gradients are accumulated directly into weights. This assumption may not be true for a module that computes a nonlinear operation. ### share(mlp,s1,s2,…,sn) This function modifies the parameters of the module named s1,..sn (if they exist) so that they are shared with (pointers to) the parameters with the same names in the given module mlp. The parameters have to be Tensors. This function is typically used if you want to have modules that share the same weights or biases. Note that this function if called on a Container module will share the same parameters for all the contained modules as well. Example: -- make an mlp mlp1=nn.Sequential(); mlp1:add(nn.Linear(100,10)); -- make a second mlp mlp2=nn.Sequential(); mlp2:add(nn.Linear(100,10)); -- the second mlp shares the bias of the first mlp2:share(mlp1,'bias'); -- we change the bias of the first mlp1:get(1).bias[1]=99; -- and see that the second one's bias has also changed.. print(mlp2:get(1).bias[1])  ### clone(mlp,…) Creates a deep copy of (i.e. not just a pointer to) the module, including the current state of its parameters (e.g. weight, biases etc., if any). If arguments are provided to the clone(...) function it also calls share(…) with those arguments on the cloned module after creating it, hence making a deep copy of this module with some shared parameters. Example: -- make an mlp mlp1=nn.Sequential(); mlp1:add(nn.Linear(100,10)); -- make a copy that shares the weights and biases mlp2=mlp1:clone('weight','bias'); -- we change the bias of the first mlp mlp1:get(1).bias[1]=99; -- and see that the second one's bias has also changed.. print(mlp2:get(1).bias[1])  ### type(type[, tensorCache]) This function converts all the parameters of a module to the given type. The type can be one of the types defined fortorch.Tensor. If tensors (or their storages) are shared between multiple modules in a network, this sharing will be preserved after type is called. To preserve sharing between multiple modules and/or tensors, use nn.utils.recursiveType: -- make an mlp mlp1=nn.Sequential(); mlp1:add(nn.Linear(100,10)); -- make a second mlp mlp2=nn.Sequential(); mlp2:add(nn.Linear(100,10)); -- the second mlp shares the bias of the first mlp2:share(mlp1,'bias'); -- mlp1 and mlp2 will be converted to float, and will share bias -- note: tensors can be provided as inputs as well as modules nn.utils.recursiveType({mlp1, mlp2}, 'torch.FloatTensor') ### float([tensorCache]) Convenience method for calling module:type(‘torch.FloatTensor'[, tensorCache]) ### double([tensorCache]) Convenience method for calling module:type(‘torch.DoubleTensor'[, tensorCache]) ### cuda([tensorCache]) Convenience method for calling module:type(‘torch.CudaTensor'[, tensorCache]) ### State Variables These state variables are useful objects if one wants to check the guts of a Module. The object pointer is never supposed to change. However, its contents (including its size if it is a Tensor) are supposed to change. In general state variables are Tensors. However, some special sub-classes like table layers contain something else. Please, refer to each module specification for further information. #### output This contains the output of the module, computed with the last call of forward(input). #### gradInput This contains the gradients with respect to the inputs of the module, computed with the last call of updateGradInput(input, gradOutput). ### Parameters and gradients w.r.t parameters Some modules contain parameters (the ones that we actually want to train!). The name of these parameters, and gradients w.r.t these parameters are module dependent. ### [{weights}, {gradWeights}] parameters() This function should returns two tables. One for the learnable parameters {weights} and another for the gradients of the energy wrt to the learnable parameters {gradWeights}. Custom modules should override this function if they use learnable parameters that are stored in tensors. ### [flatParameters, flatGradParameters] getParameters() This function returns two tensors. One for the flattened learnable parameters flatParameters and another for the gradients of the energy wrt to the learnable parameters flatGradParameters. Custom modules should not override this function. They should instead override parameters(…) which is, in turn, called by the present function. This function will go over all the weights and gradWeights and make them view into a single tensor (one for weights and one for gradWeights). Since the storage of every weight and gradWeight is changed, this function should be called only once on a given network. ### training() This sets the mode of the Module (or sub-modules) to train=true. This is useful for modules like Dropout orBatchNormalization that have a different behaviour during training vs evaluation. ### evaluate() This sets the mode of the Module (or sub-modules) to train=false. This is useful for modules like Dropout orBatchNormalization that have a different behaviour during training vs evaluation. ### findModules(typename) Find all instances of modules in the network of a certain typename. It returns a flattened list of the matching nodes, as well as a flattened list of the container modules for each matching node. Modules that do not have a parent container (ie, a top level nn.Sequential for instance) will return their self as the container. This function is very helpful for navigating complicated nested networks. For example, a didactic example might be; if you wanted to print the output size of all nn.SpatialConvolution instances: -- Construct a multi-resolution convolution network (with 2 resolutions): model = nn.ParallelTable() conv_bank1 = nn.Sequential() conv_bank1:add(nn.SpatialConvolution(3,16,5,5)) conv_bank1:add(nn.Threshold()) model:add(conv_bank1) conv_bank2 = nn.Sequential() conv_bank2:add(nn.SpatialConvolution(3,16,5,5)) conv_bank2:add(nn.Threshold()) model:add(conv_bank2) -- FPROP a multi-resolution sample input = {torch.rand(3,128,128), torch.rand(3,64,64)} model:forward(input) -- Print the size of the Threshold outputs conv_nodes = model:findModules('nn.SpatialConvolution') for i = 1, #conv_nodes do print(conv_nodes[i].output:size()) end Another use might be to replace all nodes of a certain typename with another. For instance, if we wanted to replace allnn.Threshold with nn.Tanh in the model above: threshold_nodes, container_nodes = model:findModules('nn.Threshold') for i = 1, #threshold_nodes do -- Search the container for the current threshold node for j = 1, #(container_nodes[i].modules) do if container_nodes[i].modules[j] == threshold_nodes[i] then -- Replace with a new instance container_nodes[i].modules[j] = nn.Tanh() end end end ### listModules() List all Modules instances in a network. Returns a flattened list of modules, including container modules (which will be listed first), self, and any other component modules. For example : mlp = nn.Sequential() mlp:add(nn.Linear(10,20)) mlp:add(nn.Tanh()) mlp2 = nn.Parallel() mlp2:add(mlp) mlp2:add(nn.ReLU()) for i,module in ipairs(mlp2:listModules()) do print(module) end Which will result in the following output : nn.Parallel { input |-> (1): nn.Sequential { | [input -> (1) -> (2) -> output] | (1): nn.Linear(10 -> 20) | (2): nn.Tanh | } |-> (2): nn.ReLU ... -> output } nn.Sequential { [input -> (1) -> (2) -> output] (1): nn.Linear(10 -> 20) (2): nn.Tanh } nn.Linear(10 -> 20) nn.Tanh nn.ReLU ### clearState() Clears intermediate module states as output, gradInput and others. Useful when serializing networks and running low on memory. Internally calls set() on tensors so it does not break buffer sharing. ### apply(function) Calls provided function on itself and all child modules. This function takes module to operate on as a first argument: model:apply(function(module) module.train = true end) In the example above train will be set to to true in all modules of model. This is how training() and evaluate()functions implemented. ### replace(function) Similar to apply takes a function which applied to all modules of a model, but uses return value to replace the module. Can be used to replace all modules of one type to another or remove certain modules. For example, can be used to remove nn.Dropout layers by replacing them with nn.Identity: model:replace(function(module) if torch.typename(module) == 'nn.Dropout' then return nn.Identity() else return module end end) _______________________________________________________________________________  ### narrow, select and copy two type with example https://github.com/torch/torch7/blob/master/doc/tensor.md#self-narrowdim-index-size https://github.com/torch/nn/blob/master/doc/simple.md#nn.Cosine Vision and language http://visionandlanguage.net http://handong1587.github.io/deep_learning/2015/10/09/nlp.html/ # Lemma, Theorem, Axi0m, Statements A statement is a sentence which has objective and logical meaning. A proposition is a statement which is offered up for investigation as to its truth or falsehood. The term axiom is used throughout the whole of mathematics to mean a statement which is accepted as true for that particular branch. Different fields of mathematics usually have different sets of statements which are considered as being axiomatic. The term theorem is used throughout the whole of mathematics to mean a statement which has been proved to be true from whichever axioms relevant to that particular branch. So statements which are taken as axioms in one branch of mathematics may be theorems, or irrelevant, in others. A definition lays down the meaning of a concept. It is a statement which tells the reader what something is. A lemma is a statement which is proven during the course of reaching the proof of a theorem. Logically there is no qualitative difference between a lemma and a theorem. They are both statements whose value is either true or false. However, a lemma is seen more as a stepping-stone than a theorem in itself (and frequently takes a lot more work to prove than the theorem to which it leads). Some lemmas are famous enough to be named after the mathematician who proved them (for example: Abel’s Lemma and Urysohn’s Lemma), but they are still categorised as second-class citizens in the aristocracy of mathematics. A corollary is a proof which is a direct result, or a direct application, of another proof. It can be considered as being a proof for free on the back of a proof which has been paid for with blood, sweat and tears. The word is ultimately derived from the Latin corolla, meaning small garland, or the money paid for it. Hence has the sense something extra, lagniappe orfreebie ———————————————————— # Difference between axioms, theorems, postulates, corollaries, and hypotheses Based on logic, an axiom or postulate is a statement that is considered to be self-evident. Both axioms and postulates are assumed to be true without any proof or demonstration. Basically, something that is obvious or declared to be true and accepted but have no proof for that, is called an axiom or a postulate. Axioms and postulate serve as a basis for deducing other truths. The ancient Greeks recognized the difference between these two concepts. Axioms are self-evident assumptions, which are common to all branches of science, while postulates are related to the particular science. Axioms Aristotle by himself used the term “axiom”, which comes from the Greek “axioma”, which means “to deem worth”, but also “to require”. Aristotle had some other names for axioms. He used to call them as “the common things” or “common opinions”. In Mathematics, Axioms can be categorized as “Logical axioms” and “Non-logical axioms”. Logical axioms are propositions or statements, which are considered as universally true. Non-logical axioms sometimes called postulates, define properties for the domain of specific mathematical theory, or logical statements, which are used in deduction to build mathematical theories. “Things which are equal to the same thing, are equal to one another” is an example for a well-known axiom laid down by Euclid. Postulates The term “postulate” is from the Latin “postular”, a verb which means “to demand”. The master demanded his pupils that they argue to certain statements upon which he could build. Unlike axioms, postulates aim to capture what is special about a particular structure. “It is possible to draw a straight line from any point to any other point”, “It is possible to produce a finite straight continuously in a straight line”, and “It is possible to describe a circle with any center and any radius” are few examples for postulates illustrated by Euclid. What is the difference between Axioms and Postulates? • An axiom generally is true for any field in science, while a postulate can be specific on a particular field. • It is impossible to prove from other axioms, while postulates are provable to axioms. In Geometry, “Axiom” and “Postulate” are essentially interchangeable. In antiquity, they referred to propositions that were “obviously true” and only had to be stated, and not proven. In modern mathematics there is no longer an assumption that axioms are “obviously true”. Axioms are merely ‘background’ assumptions we make. The best analogy I know is that axioms are the “rules of the game”. In Euclid’s Geometry, the main axioms/postulates are: 1. Given any two distinct points, there is a line that contains them. 2. Any line segment can be extended to an infinite line. 3. Given a point and a radius, there is a circle with center in that point and that radius. 4. All right angles are equal to one another. 5. If a straight line falling on two straight lines makes the interior angles on the same side less than two right angles, the two straight lines, if produced indefinitely, meet on that side on which are the angles less than the two right angles. (The parallel postulate). A theorem is a logical consequence of the axioms. In Geometry, the “propositions” are all theorems: they are derived using the axioms and the valid rules. A “Corollary” is a theorem that is usually considered an “easy consequence” of another theorem. What is or is not a corollary is entirely subjective. Sometimes what an author thinks is a ‘corollary’ is deemed more important than the corresponding theorem. (The same goes for “Lemma“s, which are theorems that are considered auxiliary to proving some other, more important in the view of the author, theorem). A “hypothesis” is an assumption made. For example, “If x is an even integer, then x2 is an even integer” I am not asserting that x2 is even or odd; I am asserting that if something happens (namely, if x happens to be an even integer) then something else will also happen. Here, “x is an even integer” is the hypothesis being made to prove it. 1. Since it is not possible to define everything, as it leads to a never ending infinite loop of circular definitions, mathematicians get out of this problem by imposing “undefined terms”. Words we never define. In most mathematics that two undefined terms are set and element of. 2. We would like to be able prove various things concerning sets. But how can we do so if we never defined what a set is? So what mathematicians do next is impose a list of axioms. An axiom is some property of your undefined object. So even though you never define your undefined terms you have rules about them. The rules that govern them are the axioms. One does not prove an axiom, in fact one can choose it to be anything he wishes (of course, if it is done mindlessly it will lead to something trivial). 3. Now that we have our axioms and undefined terms we can form some main definitions for what we want to work with. 4. After we defined some stuff we can write down some basic proofs. Usually known as propositions. Propositions are those mathematical facts that are generally straightforward to prove and generally follow easily form the definitions. 5. Deep propositions that are an overview of all your currently collected facts are usually called Theorems. A good litmus test, to know the difference between a Proposition and Theorem, as somebody once remarked here, is that if you are proud of a proof you call it a Theorem, otherwise you call it a Proposition. Think of a theorem as the end goals we would like to get, deep connections that are also very beautiful results. 6. Sometimes in proving a Proposition or a Theorem we need some technical facts. Those are called Lemmas. Lemmas are usually not useful by themselves. They are only used to prove a Proposition/Theorem, and then we forget about them. 7. The net collection of definitions, propositions, theorems, form a mathematical theory. Technically Axioms are self-evident or self-proving, while postulates are simply taken as given. However really only Euclid and really high end theorists and some poly-maths make such a distinction. See http://www.friesian.com/space.htm Theorems are then derived from the “first principles” i.e. the axioms and postulates. ——————————————————– Lemma is generally used to describe a “helper” fact that is used in the proof of a more significant result. Significant results are frequently called theorems. Short, easy results of theorems are called corollaries. But the words aren’t exactly that set in stone. # The Difference Between a Fact, Hypothesis, Theory, and Law In Science Words like “fact,” “theory,” and “law,” get thrown around a lot. When it comes to science, however, they mean something very specific; and knowing the difference between them can help you better understand the world of science as a whole. In this fantastic video from the It’s Okay To Be Smart YouTube channel, host Joe Hanson clears up some of the confusion surrounding four very important scientific terms: fact, hypothesis, theory, and law. Knowing the difference between these words is the key to understanding news, studies, and any other information that comes from the scientific community. Here are the main takeaways: • Fact: Observations about the world around us. Example: “It’s bright outside.” • Hypothesis: A proposed explanation for a phenomenon made as a starting point for further investigation. Example: “It’s bright outside because the sun is probably out.” • Theory: A well-substantiated explanation acquired through the scientific method and repeatedly tested and confirmed through observation and experimentation. Example: “When the sun is out, it tends to make it bright outside.” • Law: A statement based on repeated experimental observations that describes some phenomenon of nature. Proof that something happens and how it happens, but not why it happens. Example: Newton’s Law of Universal Gravitation. • Essentially, this is how all science works. You probably knew some of this, or remember bits and pieces of it from grade school, but this video does a great job of explaining the entire process. When you know how something actually works, it makes it a lot easier to understand and scrutinize. ——————————————————————- Ref: https://www.quora.com/What-are-the-differences-between-theorems-definitions-axioms-lemmas-corollaries-propositions-and-statements http://functionspace.com/topic/3465/Axioms–Postulates–Theorems–Corollaries–Hypotheses–Theories http://math.stackexchange.com/questions/7717/difference-between-axioms-theorems-postulates-corollaries-and-hypotheses http://math.stackexchange.com/questions/463362/difference-between-theorem-lemma-and-corollary http://functionspace.com/topic/3465/Axioms–Postulates–Theorems–Corollaries–Hypotheses–Theories http://lifehacker.com/the-difference-between-a-fact-hypothesis-theory-and-1732904200 ————————————————————————————– # About CNN REF[Srinivas et. al.] Reference # Case Study of Convolutional Neural Network # Ref: https://www.google.co.in/imgres?imgurl=http%3A%2F%2Fimage.slidesharecdn.com%2Flecture29-convolutionalneuralnetworks-visionspring2015-150504114140-conversion-gate02%2F95%2Flecture-29-convolutional-neural-networks-computer-vision-spring2015-28-638.jpg%3Fcb%3D1430740006&imgrefurl=http%3A%2F%2Fwww.slideshare.net%2Fjbhuang%2Flecture-29-convolutional-neural-networks-computer-vision-spring2015&docid=4H5rs1C7yE0SOM&tbnid=hwo2wB7VAAGuHM%3A&w=638&h=479&ved=0ahUKEwiNjanOtrHOAhVIwI8KHRDlAGkQMwhEKBQwFA&iact=mrc&uact=8&biw=1329&bih=663#h=479&imgdii=hwo2wB7VAAGuHM%3A%3Bhwo2wB7VAAGuHM%3A%3BI0fHCe0fvva2pM%3A&w=638 # http://www.mdpi.com/2072-4292/7/11 # Transfer Learning ### Action and Attributes from Wholes and Parts ### R-CNNs for Pose Estimation and Action Detection ### Contextual Action Recognition with R*CNN —————————————————————- http://www.mshahriarinia.com/home/ai/machine-learning/neural-networks/deep-learning/theano-mnist/3-convolutional-neural-network-lenet Good One http://rnduja.github.io/2015/10/12/deep_learning_with_torch_step_5_rnn_lstm/ # Implementation Issue for VQA 1. Jupyter Setup Issue ---------------------------------------------------------------$jupyter-notebook
Traceback (most recent call last):
File "/usr/local/bin/jupyter-notebook", line 7, in <module>
from notebook.notebookapp import main
File "/usr/local/lib/python2.7/dist-packages/notebook/notebookapp.py",
line 45, in <module>
raise ImportError(msg + ", but you have %s" % tornado.version)
ImportError: The Jupyter Notebook requires tornado >= 4.0, but you have 3.1.1
------------------------------------------------------------------
Answer: This problem will be seen, if u have installed both ipython-notebook and
jupyter-notebook.

$It can be solved by uninstalling ipython -notebook. 2. CuDNN problems user@user-XPS-8500:~/neural-style$ th neural_style.lua -gpu 0 -backend cudnn
nil
Then make sure files named as libcudnn.so.4 or libcudnn.4.dylib are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
[C]: in function 'error'
/home/user/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
neural_style.lua:64: in function 'main'
neural_style.lua:500: in main chunk
[C]: in function 'dofile'
.../user/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
user@user-XPS-8500:~/neural-style$----------------------------------------------- & Solution After registering as a developer with NVIDIA, you can download cuDNN here. Make sure to download Version 4. After dowloading, you can unpack and install cuDNN like this: tar -xzvf cudnn-7.0-linux-x64-v4.0-prod.tgz sudo cp cuda/lib64/libcudnn* /usr/local/cuda-7.0/lib64/ sudo cp cuda/include/cudnn.h /usr/local/cuda-7.0/include/ Also check your LD_LIBRARY_PATH: echo$LD_LIBRARY_PATH

You should see /usr/local/cuda-7.0/lib64 along with possibly other things.

if ur not getting like this then do

You need to add something like this to your .bashrc or other startup scripts.

export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:$LD_LIBRARY_PATH then source ~/.bashrc. Which can solve the problem. ———————————————————————————– # Setup for Torch ## Steps for torch installation: 1. rm -rf ~/torch 2. git clone https://github.com/torch/distro.git ~/torch –recursive 3. cd ~/torch; 4. ./install.sh 5. source ~/.bashrc ## Update To update your already installed distro to the latest master branch ./update.sh ## Test To test that all libraries are installed properly by running: ## ./test.shFor other packages: 1. luarocks install rnn 2. luarocks install loadcaffe 3. luarocks install hdf5 ————————————————————————————- If you face problem in the installation like: issue: cutorch/lib/THC/generic/THCTensorMath.cu(393): error: more than one operator “==” matches these operands: [ 14%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o /home/ubuntu/torch/extra/cutorch/lib/THC/generic/THCTensorMath.cu(393): error: more than one operator "==" matches these operands: function "operator==(const __half &, const __half &)" function "operator==(half, half)" operand types are: half == half /home/ubuntu/torch/extra/cutorch/lib/THC/generic/THCTensorMath.cu(414): error: more than one operator "==" matches these operands: function "operator==(const __half &, const __half &)" function "operator==(half, half)" operand types are: half == half [ 15%] Building NVCC (Device) object lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathReduce.cu.o 2 errors detected in the compilation of "/tmp/tmpxft_00002141_00000000-4_THCTensorMath.cpp4.ii". CMake Error at THC_generated_THCTensorMath.cu.o.cmake:267 (message): Error generating file /home/ubuntu/torch/extra/cutorch/build/lib/THC/CMakeFiles/THC.dir//./THC_generated_THCTensorMath.cu.o lib/THC/CMakeFiles/THC.dir/build.make:112: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o' failed make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMath.cu.o] Error 1 make[2]: *** Waiting for unfinished jobs.... ^Clib/THC/CMakeFiles/THC.dir/build.make:105: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorCopy.cu.o' failed make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorCopy.cu.o] Interrupt lib/THC/CMakeFiles/THC.dir/build.make:140: recipe for target 'lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o' failed make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorMathPairwise.cu.o] Interrupt CMakeFiles/Makefile2:172: recipe for target 'lib/THC/CMakeFiles/THC.dir/all' failed make[1]: *** [lib/THC/CMakeFiles/THC.dir/all] Interrupt Makefile:127: recipe for target 'all' failed make: *** [all] Interrupt Error: Build error: Failed building. Solution: 1. ./clean.sh (clear the installation) 2. export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" 3. ./install.sh Ref: https://github.com/torch/cutorch/issues/797 https://github.com/torch/distro/issues/239 # Issue: Data parallel: arguments are located on different GPUs: /usr/local/torch/extra/cutorch/lib/THC/generic/THCTensorMath.cu:21 stack traceback: Solution: This is the problem of makeDataParallel function, which uses cutorch reinstall torch with the latest release : or update then reinstall • luarocks install cutorch • luarocks install cunn Issue:$ luarocks install cutorch

Solution: Reinstall torch

Reference:

http://torch.ch/

https://en.wikipedia.org/wiki/Torch_(machine_learning)

# Torch Installation

## Step 0: Installing Torch

# in a terminal, run the commands WITHOUT sudo
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh


The first script installs the basic package dependencies that LuaJIT and Torch require. The second script installs LuaJIT, LuaRocks, and then uses LuaRocks (the lua package manager) to install core packages like torch, nn and paths, as well as a few other packages.

The script adds torch to your PATH variable. You just have to source it once to refresh your env variables. The installation script will detect what is your current shell and modify the path in the correct configuration file.

# On Linux with bash
source ~/.bashrc


if you ever need to uninstall torch, simply run the command:

rm -rf ~/torch


## Step 1: Install torch with Lua 5.2

If you want to install torch with Lua 5.2 instead of LuaJIT, simply run:

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch

# clean old torch installation
./clean.sh

# https://github.com/torch/distro : set env to use lua
TORCH_LUA_VERSION=LUA52 ./install.sh


## Step 3: Install required packages using Luarocks

New packages can be installed using Luarocks from the command-line:

# Install using luarocks
luarocks install torch
luarocks install nn
luarocks install nngraph
$luarocks install image luarocks install optim luarocks install lua-cjson  major luarocks install torch-word-emb luarocks install rnn luarocks install nnx luarocks install dp luarocks install dpnn luarocks install itorch luarocks install sys luarocks install xlua luarocks install penlight luarocks install display luarocks install gnuplot luarocks install imgraph luarocks install signal ---optional luarocks install senna luarocks install cephes ----------------------------------------------------------- ## Step 4: Install torch-hdf5 # We need to install torch-hdf5 from GitHub git clone https://github.com/deepmind/torch-hdf5 cd torch-hdf5 luarocks make hdf5-0-0.rockspec  ## Step 5: Install loadcaffe loadcaffe depends on Google’s Protocol Buffer library so we’ll need to install that first: sudo apt-get install libprotobuf-dev protobuf-compiler  Now we can instal loadcaffe: luarocks install loadcaffe  ## Step 6: Install CUDA backend for torch(For GPU Only) If you’d like to train on an NVIDIA GPU using CUDA (this can be to about 15x faster), you’ll of course need the GPU, and you will have to install the CUDA Toolkit. Then get the cutorch and cunn packages: luarocks install cutorch luarocks install cunn luarocks install cudn ## Step 7: Install OpenCL Backend for torch(Optional) If you’d like to use OpenCL GPU instead (e.g. ATI cards), you will instead need to install the cltorch and clnn packages, and then use the option -opencl 1 during training (cltorch issues): luarocks install cltorch luarocks install clnn  ## Step 8: Install cuDNN(Optional) cuDNN is a library from NVIDIA that efficiently implements many of the operations (like convolutions and pooling) that are commonly used in deep learning. After registering as a developer with NVIDIA, you can download cuDNN here. Make sure to download Version 4. After dowloading, you can unpack and install cuDNN like this: tar -xzvf cudnn-7.0-linux-x64-v4.0-prod.tgz sudo cp cuda/lib64/libcudnn* /usr/local/cuda-7.0/lib64/ sudo cp cuda/include/cudnn.h /usr/local/cuda-7.0/include/ Also check your LD_LIBRARY_PATH: echo$LD_LIBRARY_PATH

You should see /usr/local/cuda-7.0/lib64 along with possibly other things.

if ur not getting like this then do

You need to add something like this to your .bashrc or other startup scripts.

export LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64:$LD_LIBRARY_PATH then source ~/.bashrc. Which can solve following problem CuDNN problems user@user-XPS-8500:~/neural-style$ th neural_style.lua -gpu 0 -backend cudnn
nil
Then make sure files named as libcudnn.so.4 or libcudnn.4.dylib are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
[C]: in function 'error'
/home/user/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
neural_style.lua:64: in function 'main'
neural_style.lua:500: in main chunk
[C]: in function 'dofile'
.../user/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
user@user-XPS-8500:~/neural-style$----------------------------------------------- Next we need to install the torch bindings for cuDNN: luarocks install cudnn ------------------------------------------------------------------------  Ref http://torch.ch/docs/getting-started.html https://github.com/torch/torch7/wiki/Cheatsheet https://github.com/jcjohnson/neural-style/issues/154 https://github.com/jcjohnson/neural-style/blob/master/INSTALL.md https://github.com/karpathy/char-rnn https://github.com/karpathy/neuraltalk2  # Install Torch Locally https://github.com/torch/distro  ## Self-contained Torch installation Install dependencies. Usesapt-get on Ubuntu, which might require sudo. Usesbrew on OSX. curl -s https://raw.githubusercontent.com/torch/distro/master/install-deps | bash Install this repo, which installs the torch distribution, with a lot of nice goodies. git clone https://github.com/torch/distro.git ~/torch --recursive cd ~/torch; ./install.sh By default Torch will install LuaJIT 2.1. If you want other options, you can use the command: TORCH_LUA_VERSION=LUA51 ./install.sh TORCH_LUA_VERSION=LUA52 ./install.sh Now, everything should be installed. Either open a new shell, or source your profile via . ~/.bashrc # or: . ~/.zshrc th -e "print 'I just installed Torch! Yesss.'" Note: If you use a non-standard shell, you’ll want to run this command ./install/bin/torch-activate Tested on Ubuntu 14.04, CentOS/RHEL 6.3 and OSX ## for more information, plz check https://github.com/torch/torch7/issues/27 luarocks install --deps-mode=all --local nngraph  # Torch Installation 3 Steps Step:1 git clone https://github.com/torch/distro.git ~/torch --recursive cd ~/torch; ./install.sh By default Torch will install LuaJIT 2.1. If you want other options, you can use the command: step 2: (optional) TORCH_LUA_VERSION=LUA51 ./install.sh  Now, everything should be installed. Either open a new shell, or source your profile via step 3: . ~/.bashrc th -e "print 'I just installed Torch! Yesss.'"  # Torch uninstallation rm -rf ~/torch  # luarocks package Installation luarocks install cunn 1.0.0 luarocks remove cutorch # luarocks package remove remove – Uninstall a rock. luarocks remove --force cunn 1.0.0 luarocks remove --force cutorch REF: https://github.com/keplerproject/luarocks/wiki/luarocks https://github.com/keplerproject/luarocks/wiki/remove http://torch.ch/docs/getting-started.html https://github.com/torch/torch7/wiki/Cheatsheet  # Setup for Keras (Tensorflow Backend) and for Keras (Theano Backend) Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. In order to install Keras, it requires miniconda on python 2.x . Also, Keras uses the following dependencies: 1. NumPy 2. SciPy 3. HDF5 and h5py 4. Theano 5. TensorFlow ## Step 0: Install miniconda $ wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-uname -p.sh
$bash Miniconda-latest-Linux-uname -p.sh$ source ~/.bashrc

## Step 1: Numpy, scipy, nose

a: Numpy:

NumPy is the fundamental package needed for scientific computing with Python. This package contains:

• a powerful N-dimensional array object
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier transform, and random number capabilities.

b: Scipy

SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. In particular, these are some of the core packages: NumPy,SciPy library,Matplotlib,IPython,Sympu,pandas

c:Nose:

nose extends unittest to make testing easier

d:PIP

pip is the preferred installer program. pip is a package management system used to install and manage software packages written in Python.

## Step 3: Install Theano OrInstall TensorFlow

Please install either Theano or TensorFlow.  If you want to use Karas with backend as theano , so don't install TensorFlow and viceversa.

## A: Theano

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. 

• tight integration with NumPy – Use numpy.ndarray in Theano-compiled functions.
• transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
• efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs.
• speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
• dynamic C code generation – Evaluate expressions faster.
• extensive unit-testing and self-verification – Detect and diagnose many types of errors.

## Step 5: spaCy: Industrial-strength NLP

spaCy is a library for advanced natural language processing in Python and Cython.
Documentation and details: https://spacy.io/

### Install spaCy with conda

$conda install -c https://conda.anaconda.org/spacy spacy$ conda install spacy
$python -m spacy.en.download all --force ref: https://github.com/spacy-io/spaCy ## Step 6: scikit-learn (Machine Learning in Python) scikit-learn is a Python module for machine learning built on top of SciPy. • Simple and efficient tools for data mining and data analysis • Built on NumPy, SciPy, and matplotlib ### Install scikit-learn with conda $ conda install scikit-learn

ref:

http://scikit-learn.org/stable/

https://github.com/scikit-learn/scikit-learn

## Step 7: Python Progressbar

It is s text progress bar library for Python. Typically used to display the progress of a long running operation, providing a visual cue that processing is underway.

$echo '{"epsilon":1e-07,"floatx":"float32","backend":"tensorflow"}' > ~/.keras/keras.json  ## Step 10:Make sure that Keras runs with TensorFlow $ curl -sSL https://github.com/fchollet/keras/raw/master/examples/mnist_mlp.py | python


------------------------------------------------------------------------------------

Reference

http://keras.io/

http://ermaker.github.io/blog/2015/09/08/get-started-with-keras-for-beginners.html

http://ermaker.github.io/blog/2016/06/22/get-started-with-keras-for-beginners-tensorflow-backend.html