Long Short Term Memory has transformed and improved the performance of tools such as Google's Speech Recognition and Automatic Translations, and Amazon's Alexa responses. This lies in its memorization capability. I will elaborate on this.
Unlike conventional artificial neural networks that are not able to "remember" time interval's values, and information, Long Short Term Memory (LSTM) can be defined as an artificial neural network architecture that can process sequential information.
In other words, LSTMs can process information that relies on memorizing each element of a set, such as a video fragment, for example. In this article I cover how Long Short Term Memory Neural Networks work and their major components.
Long Short Term Memory Neural Networks are a special kind of recurrent neural network able to learn connections in sequences of information. Among Long Short Term Memory Neural Networks' main applications, we can mention: natural language tasks processing, audio processing, and video frames sequence processing.
Now we already know the concept behind Long Short Term Memory, or LSTM, is the use of sequential information. But how does it work?
LSTM, like every Artificial Neural Network architecture, is bio-inspired. It means that they work in a similar way to some biological processes. Convolutional Neural Networks are inspired by how visual cortex works on human brain. LSTM, on the other hand, is rooted in the way that memory works. Therefore, it is divided into two types: short-term memory, and long-term memory.
Short-term memory: operates when information is acquired. In short-term memory, information is only retained for a few seconds, and then it is destined to be kept for longer periods or discarded.
Long-term memory: retains the information, allowing its retrieval or recall later. All the knowledge we have is "stored" in the long-term memory.
In summary, the architecture of the LSTM consists in a set of memory cells, connected recursively - which has transformed and improved the performance of tools such as Google's Speech Recognition and Automatic Translations, and Amazon's Alexa responses. I will detail how it happened below.
LSTM memory cells or blocks (Figure 1) retain and manipulate information through gates which control information flow between each cell. It has three kinds of gates, as it follows:
Forget Gate: decides which information should be discarded. In other words, "forgotten" by the memory cell.
Input Gate: adds information to the memory cell's state.
Output Gate: extracts useful information from the current memory cell's state by forwarding it to the subsequent memory cell.
Each gate uses a particular math function to handle information from the memory cell according to its role: forget, input, or output.
Figure 1. LSTM memory block. Source: Data Science Academy (2021).
In addition to the gates, the LSTM memory cell has two entries, represented by Xt and ht-1 where: Xt represents the entrance at the specific moment, and ht-1 the exit from the previous cell.
The long-term memory or cell state has a connection with every gate, as shown in Figure 1. The cell's output, used as input for the next cell, is represented by Zt.
Finally, the cell also contains an input modulation gate, which is responsible for determining a value based on the current input, as well as on the previous cell's output.
In this article, we have discussed LSTM uses, and how its architecture works, as well as its key element: the memory cell. Some of the LSTM applications include: language translation, text generation, chatbots, audio transcription, subtitles in images, and also performing action recognition in videos.
Data Science Academy (2021) 'Deep Learning Book'. Available at: <https://www.deeplearningbook.com.br/> (Accessed: 29 July 2021).
Van Houdt, G., Mosquera, C. & Nápoles, G (2020) 'A review on the long short-term memory model', Artif Intell Rev 53, 5929--5955, DOI. Available at: <https://doi.org/10.1007/s10462-020-09838-1> (Accessed: 29 July 2021).