Gated Recurrent Unit - An Introduction

raviteja
2 min readDec 3, 2020

--

To understand GRU, please go through this article first before reading this one.

Let us start with a simple example. Consider the following review of a movie.

If we read the whole review, we will be able to understand the tone of the movie. But when asked for a review yourself our mind can remember the decisive words only such as ‘blockbuster’, ‘legendary’, ‘thrills’, ‘satisfying’ etc. This is the exact function of a GRU.

To get a general understanding of what are GRU’s I suggest you start reading the articles below and then get back here.

ARCHITECTURE

We have understood that RNN during backpropagation, the vanishing gradient becomes smaller and smaller until its completely irrelevant. Hence the earlier layers do not learn. Because of this RNN cannot remember long term information as much. GRU’s are created to solve this problem.

GRU’s have gates which help decided information to remember or forget hence called Gated Recurrent Units. GRU’s have two gates. One is the reset gate and other is update gate. These gates understand which to keep and which information to throw away.

Tan H activation function is sued to make sure that the value is always between -1 and 1, as the values have a chance of exploding because of extremely high values. Sigmoid function does exactly the same but the values are between 0 and 1. GRU’s are like LSTM but they use a hidden state instead of cell state of the LSTM, to transfer information.

Update Gate — It decides which information to throw away and what new information to add which is precisely like forget gate of LSTM

Reset gate — This decides how much of passed information to forget.

After understanding what a LSTM is a GRU is slightly more advanced version of LSTM. GRU’s have less of tensor operations than LSTM so they are faster.

For further understanding of computational capacity you can refer to this paper EMPIRICAL EVALUATION OF GRU

--

--