Ref:

http://prog3.com/sbdm/blog/lanran2/article/details/50603861

https://apaszke.github.io/lstm-explained.html

Paper:

**What is RNN**

RNN: multi layer feedback RNN (neural Network Recurrent, recurrent neural network) neural network is a kind of artificial neural network which is connected to the ring. The internal state of the network can display dynamic time series behavior. Unlike feedforward neural networks, RNN can use its internal memory to process arbitrary timing input sequences, which allows it to be more easily processed such as non segmented handwriting recognition, speech recognition, etc.. – Baidu Encyclopedia

Here we look at the abstract out of the RNN formula:

ht=θϕ(ht−1)+θxxt

yt=θyϕ(ht)

You can find that each RNN has to use the last time the middle layer of the outputht

**The shortcomings of the traditional RNN – the gradient of the Vanishing (gradient problem)**

We define function loss asEThen the gradient formula is as follows:

∂E∂θ=∑St=1∂Et∂θ

∂Et∂θ=∑tk=1∂Et∂yt∂yt∂ht∂ht∂hk∂hk∂θ

∂ht∂hk=∏ti=k+1∂hi∂hi−1=∏ti=k+1θTdiag[ϕ′(hi−1)]

||∂hi∂hi−1||≤||θT||||diag[ϕ′(hi−1)]||≤γθγϕ

∂ht∂hk≤(γθγϕ)t−k

Multiplied by less than 1 of the number, the gradient will be smaller and smaller. In order to solve this problem, LSTM came into being.

**LSTM introduction**

Definition: LSTM (Term Memory Long-Short, LSTM)

Is a time recurrent neural network, the paper was first published in 1997. Due to the unique design structure, LSTM is suitable for processing and prediction of time series in the interval and delay is very long important events. – Baidu Encyclopedia

Mentioned LSTM, always accompanied by a picture as shown below:

Can be seen from the figure, in addition to the input, there are three parts: 1) Gate Input; 2) Gate Forget; 3) Gate Output

According to the RNN mentioned above, our input isxtandht−1, while the input ishtandct(state cell), where state LSTM is the key to cell, which makes LSTM with memory function. Here’s a formula for LSTM:

**1) Gate Input:**

it=σ(Wxixt+Whiht−1+bi)=σ(linearxi(xt)+linearhi(ht−1))

amongσRefers to the sigmoid function.

**2) Gate Forget:**Decide whether to delete or retain memory (memory)

ft=σ(Wxfxt+Whfht−1+bf)

**3) Gate Output:**

ot=σ(Wxoxt+Whoht−1+bo)

**4) update Cell:**

gt=tanh(Wxgxt+Whght−1+bg)

**5) State Update Cell:**

ct=ft⊙ct−1+it⊙gt

**6) Output of LSTM Final:**

ht=ot⊙tanh(ct)

Above is a formula for cell involved in LSTM,** Below to explain why LSTM can solve the problem of gradient disappear in RNN**.

Because each factor is very close to 1, so the gradient is difficult to decay, so as to solve the problem of gradient disappear.

**Nngraph Torch**

Before the use of LSTM to prepare torch, we need to learn a tool nngraph torch, an nngraph to the following commands:

`LuarocksInstallNngraph`

Nngraph detailed introduction:Https://github.com/torch/nngraph/

Nngraph can facilitate the design of a neural network module. We first use nngraph to create a simple network module:

z=x1+x2⊙linear(x3)

We can see that the input of this module is a total of three,x1,x2andx3, the output isz. The following is the implementation of this module torch code:

```
Require'nngraph'
X1=nn.Identity(())
X2=nn.Identity(())
X3=nn.Identity(())
L=nn.CAddTable () () (){x1, nn.CMulTable () ({x2) () (nn.Linear) (20,10) (x3)}}))
Mlp=nn.gModule ({x1, X2, x3},{L})
```

First we definex1,x2andx3, use**Nn.Identity () () () ()**And then tolinear(x3)We use**X4=nn.Linear (20,10) (x3)**A linear neural network with 20 neurons in the output layer is defined, and a linear neural network with 10 neurons in the output layer is defined.x2⊙linear(x3), use**X5=nn.CMulTable () (X2, x4)**For; forx1+x2⊙linear(x3)We use**Nn.CAddTable () (x1, x5)**To achieve; finally use**Nn.gModule ({input}, {output})**To define the neural network module.

We use the forward method to test whether our Module is correct:

```
H1=Torch.Tensor{One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Ten}
H2=Torch.Tensor (Ten(fill ().One)
H3=Torch.Tensor (Twenty(fill ().Two)
B=Mlp:forward ({h1, H2, h3})
Parameters=Mlp:parameters ()One]
Bias=Mlp:parameters ()Two]
Result=Torch.cmul (H2, (parameters*h3+bias)) +h1
```

First we define three inputsh1,h2andh3, then call the module forward MPL command to get the output B, and then we get the network weights w and bias are saved in the parameters and bias variables, calculationz=h1+h2⊙linear(h3)Result**Result=torch.cmul (H2, (parameters*h3+bias)) +h1**, finally compare B and result is consistent, we found that the results of the calculation is the same, that our module is correct.

**Use LSTM to prepare the nngraph module**

Now we use nngraph to write the LSTM module described above, the code is as follows:

```
Require 'nngraph'
Function LSTM(XT, prev_c, prev_h)
Function New_input_sum()
Local I2h=NN.Linear(Four hundred,Four hundred)
Local H2H=NN.Linear(Four hundred,Four hundred)
Return NN.CAddTable()({i2h(XT)H2H.(prev_h)})
End
Local Input_gate=NN.Sigmoid()(new_input_sum())
Local Forget_gate=NN.Sigmoid()(new_input_sum())
Local Output_gate=NN.Sigmoid()(new_input_sum())
Local GT=NN.Tanh()(new_input_sum())
Local CT=NN.CAddTable()({nn.CMulTable()({forget_gate, prev_c}), nn.CMulTable()({input_gate, gt})})
Local HT=NN.CMulTable()({output_gate, nn.Tanh()(CT)})
Return CT,HT
End
XT=NN.Identity()()
Prev_c=NN.Identity()()
Prev_h=NN.Identity()()
LSTM=NN.GModule({xt, prev_c, prev_h}, {lstm(XT, prev_c, prev_h)})
```

among**XT**and**Prev_h**Is input,**Prev_c**Is state cell, and then we follow the previous formula one calculation, the final output**CT**(cell state new) (), HT (output). The calculation sequence of the code is completely consistent with the above, so here is no longer one one explained.