Recurrent Neural Network (RNN)
What are recurrent neural networks?
A recurrent neural network (RNN) is a type of artificial neural network which uses sequential data or time series data. These deep learning algorithms are commonly used for ordinal or temporal problems, such as language translation, natural language processing (NLP), speech recognition, and machine translation; they are incorporated into popular applications such as Siri, voice search, and Google Translate. Like feedforward and convolutional neural networks (CNNs).
Why Recurrent Neural Networks?
- RNN were created because there were a few issues in the feed-forward neural network:
- Cannot handle sequential data
- Considers only the current input
- Cannot memorize previous inputs
- The solution to these issues is the RNN. An RNN can handle sequential data, accepting the current input data, and previously received inputs. RNNs can memorize previous inputs due to their internal memory.
LSTM (Variant of RNN)
- Starting at the beginning, let’s define our neural net framework. When working with sequential data (NLP, time series, etc.), it’s common practice to use a Recurrent Neural Network (RNN).
- RNNs are simply a sequence of neural nets where the output of one is the input to the next.
- However, RNNs are not perfect. When training, there is potential for unstable gradients. And, that’s where Long Short-Term Memory (LSTM) cells come in.
LSTMs are a specialized cell in RNNs that regulates incoming and outgoing information using gates.
What is Vanishing Gradients?
- Vanishing Gradient occurs when the derivative or slope will get smaller and smaller as we go backward with every layer during backpropagation.
- When weights update is very small or exponential small, the training time takes too much longer, and in the worst case, this may completely stop the neural network training.
- A vanishing Gradient problem occurs with the sigmoid and tanh activation function because the derivatives of the sigmoid and tanh activation functions are between 0 to 0.25 and 0–1. Therefore, the updated weight values are small, and the new weight values are very similar to the old weight values. This leads to Vanishing Gradient problem. We can avoid this problem using the ReLU activation function because the gradient is 0 for negatives and zero input, and 1 for positive input.
What is Exploding Gradients?
- Exploding gradient occurs when the derivatives or slope will get larger and larger as we go backward with every layer during backpropagation. This situation is the exact opposite of the vanishing gradients.
- Can be occurred in relu activation function as it has no boundary
- This problem happens because of weights, not because of the activation function. Due to high weight values, the derivatives will also higher so that the new weight varies a lot to the older weight, and the gradient will never converge. So it may result in oscillating around minima and never come to a global minima point.
- Gradient clipping : It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.
*** *** ***

تعليقات
إرسال تعليق