Long short-term memory (LSTM) RNN in TensorFlow

Last updated on Oct 06 2022
Goutam Joseph

Table of Contents

Long short-term memory (LSTM) RNN in TensorFlow

Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the field of deep learning. It was proposed in 1997 by Sepp Hochreiter and Jurgen schmidhuber. Unlike standard feed-forward neural networks, LSTM has feedback connections. It can process not only single data points (such as images) but also entire sequences of data (such as speech or video).

For example, LSTM is an application to tasks such as unsegmented, connected handwriting recognition, or speech recognition.

A general LSTM unit is composed of a cell, an input gate, an output gate, and a forget gate. The cell remembers values over arbitrary time intervals, and three gates regulate the flow of information into and out of the cell. LSTM is well-suited to classify, process, and predict the time series given of unknown duration.

Long Short- Term Memory (LSTM) networks are a modified version of recurrent neural networks, which makes it easier to remember past data in memory.

  1. Input gate- It discover which value from input should be used to modify the memory. Sigmoid function decides which values to let through 0 or 1. And tanh function gives weightage to the values which are passed, deciding their level of importance ranging from -1 to 1.
Page 1 Image 1 17
formula1
  1. Forget gate- It discover the details to be discarded from the block. A sigmoid function decides it. It looks at the previous state (ht-1) and the content input (Xt) and outputs a number between 0(omit this) and 1(keep this) for each number in the cell state Ct-1.
Page 1 Image 2 3
forget gate
  1. Output gate- The input and the memory of the block are used to decide the output. Sigmoid function decides which values to let through 0 or 1. And tanh function decides which values to let through 0, 1. And tanh function gives weightage to the values which are passed, deciding their level of importance ranging from -1 to 1 and multiplied with an output of sigmoid.
Page 2 Image 3 5
output
Page 2 Image 4
RNN

It represents a full RNN cell that takes the current input of the sequence xi, and outputs the current hidden state, hi, passing this to the next RNN cell for our input sequence. The inside of an LSTM cell is a lot more complicated than a traditional RNN cell, while the conventional RNN cell has a single “internal layer” acting on the current state (ht-1) and input (xt).

Page 3 Image 5 4
LSTM

In the above diagram, we see an “unrolled” LSTM network with an embedding layer, a subsequent LSTM layer, and a sigmoid activation function. We recognize that our inputs, in this case, words in a movie review, are input sequentially.

The words are inputted into an embedding lookup. In most cases, when working with a corpus of text data, the size of the vocabulary is unusually large.

This is a multidimensional, distributed representation of words in a vector space. These embeddings can be learned using other deep learning techniques like word2vec, we can train the model in an end-to-end fashion to determine the embedding as we teach.

These embeddings are then inputted into our LSTM layer, where the output is fed to a sigmoid output layer and the LSTM cell for the next word in our sequence.

LSTM Layers

We will set up a function to build the LSTM layers to handle the number of layers and sizes dynamically. The service will take a list of LSTM sizes, which can indicate the number of LSTM layers based on the list’s length (e.g., our example will use a list of length 2, containing the sizes 128 and 64, indicating a two-layered LSTM network where the first layer size 128 and the second layer has hidden layer size 64).

  1. def build_lstm_layers(lstm_sizes, embed, keep_prob_, batch_size):
  1.     “””
  1. Create the LSTM layers
  1.     “””
  1. lstms = [tf.contrib.rnn.BasicLSTMCell(size) for size in lstm_sizes]
  1.     # Add dropout to the cell
  1. drops = [tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob_) for lstm in lstms]
  1.     # Stacking up multiple LSTM layers, for deep learning
  1. cell = tf.contrib.rnn.MultiRNNCell(drops)
  1. # Getting an initial state of all zeros
  1. initial_state = cell.zero_state(batch_size, tf.float32)
  1.     lstm_outputs, final_state = tf.nn.dynamic_rnn(cell, embed, initial_state=initial_state)

The list of dropout wrapped LSTMs are then passed to a TensorFlow MultiRNN cell to stack the layers together.

Loss function, optimizer and accuracy

Finally, we create functions to define our model loss function, optimizer, and our accuracy. Even though the loss and accuracy are just calculated based on results, In TensorFlow everything is part of a computation graph.

  1. def build_cost_fn_and_opt(lstm_outputs, labels_, learning_rate):
  1.     “””
  1. Creating the Loss function and Optimizer
  1.     “””
  1. predictions = tf.contrib.layers.fully_connected(lstm_outputs[:, -1], 1, activation_fn=tf.sigmoid)
  1.     loss = tf.losses.mean_squared_error(labels_, predictions)
  1. optimzer = tf.train.AdadeltaOptimizer (learning_rate).minimize(loss)
  1. def build_accuracy(predictions, labels_):
  1. “””
  1.     Create accuracy
  1. “””
  1.     correct_pred = tf.equal(tf.cast(tf.round(predictions), tf.int32), labels_)
  1. accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

Building the graph and training

First, we call each of the functions we have defined to construct the network and call a TensorFlow session to train the model over a predefined number of epochs using mini-batches. At the end of every epoch, we will print the loss, training accuracy, and validation accuracy to monitoring the results as we train the model.

  1. def build_and_train_network(lstm_sizes, vocab_size, embed_size, epochs, batch_size,
  1.                  learning_rate, keep_prob, train_x, val_x, train_y, val_y):
  1.     # Build Graph
  1.     with tf.Session() as sess:
  1.         # Train Network
  1. # Save Network

Next, we define our model hyperparameters, and we will build a two-layer LSTM network with hidden layer sizes of 128 and 64, respectively.

When the model is done training, we use a TensorFlow saver to save out the model parameters for later use.

Epoch: 1/50 Batch: 303/303 Train Loss: 0.247 Train Accuracy: 0.562 Val Accuracy: 0.578Epoch: 2/50 Batch: 303/303 Train Loss: 0.245 Train Accuracy: 0.583 Val Accuracy: 0.596Epoch: 3/50 Batch: 303/303 Train Loss: 0.247 Train Accuracy: 0.597 Val Accuracy: 0.617Epoch: 4/50 Batch: 303/303 Train Loss: 0.240 Train Accuracy: 0.610 Val Accuracy: 0.627Epoch: 5/50 Batch: 303/303 Train Loss: 0.238 Train Accuracy: 0.620 Val Accuracy: 0.632Epoch: 6/50 Batch: 303/303 Train Loss: 0.234 Train Accuracy: 0.632 Val Accuracy: 0.642Epoch: 7/50 Batch: 303/303 Train Loss: 0.230 Train Accuracy: 0.636 Val Accuracy: 0.648Epoch: 8/50 Batch: 303/303 Train Loss: 0.227 Train Accuracy: 0.641 Val Accuracy: 0.653Epoch: 9/50 Batch: 303/303 Train Loss: 0.223 Train Accuracy: 0.646 Val Accuracy: 0.656Epoch: 10/50 Batch: 303/303 Train Loss: 0.221 Train Accuracy: 0.652 Val Accuracy: 0.659

Testing

Finally, we check our model results on the test set to make sure they are in line with what we observed during training.

  1. def test_network(model_dir, batch_size, test_x, test_y):
  1. # Build Network
  1. with tf.Session() as sess:
  1. # Restore Model
  1.     # Test Model

The test accuracy is 72%. This is right in line with our validation accuracy and indicates that we captured in an appropriate distribution of our data across our data splitting.

INFO:tensorflow:Restoring parameters from checkpoints/sentiment.ckptTest Accuracy: 0.717

So, this brings us to the end of blog. This Tecklearn ‘Long short-term memory (LSTM) RNN in Tensor Flow’ blog helps you with commonly asked questions if you are looking out for a job in Artificial Intelligence. If you wish to learn Artificial Intelligence and build a career in AI or Machine Learning domain, then check out our interactive, Artificial Intelligence and Deep Learning with TensorFlow Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/artificial-intelligence-and-deep-learning-with-tensorflow/

Artificial Intelligence and Deep Learning with TensorFlow Training

About the Course

Tecklearn’s Artificial Intelligence and Deep Learning with Tensor Flow course is curated by industry professionals as per the industry requirements & demands and aligned with the latest best practices. You’ll master convolutional neural networks (CNN), TensorFlow, TensorFlow code, transfer learning, graph visualization, recurrent neural networks (RNN), Deep Learning libraries, GPU in Deep Learning, Keras and TFLearn APIs, backpropagation, and hyperparameters via hands-on projects. The trainee will learn AI by mastering natural language processing, deep neural networks, predictive analytics, reinforcement learning, and more programming languages needed to shine in this field.

Why Should you take Artificial Intelligence and Deep Learning with Tensor Flow Training?

  • According to Paysa.com, an Artificial Intelligence Engineer earns an average of $171,715, ranging from $124,542 at the 25th percentile to $201,853 at the 75th percentile, with top earners earning more than $257,530.
  • Worldwide Spending on Artificial Intelligence Systems Will Be Nearly $98 Billion in 2023, According to New IDC Spending Guide at a GACR of 28.5%.
  • IBM, Amazon, Apple, Google, Facebook, Microsoft, Oracle and almost all the leading companies are working on Artificial Intelligence to innovate future technologies.

What you will Learn in this Course?

Introduction to Deep Learning and AI

  • What is Deep Learning?
  • Advantage of Deep Learning over Machine learning
  • Real-Life use cases of Deep Learning
  • Review of Machine Learning: Regression, Classification, Clustering, Reinforcement Learning, Underfitting and Overfitting, Optimization
  • Pre-requisites for AI & DL
  • Python Programming Language
  • Installation & IDE

Environment Set Up and Essentials

  • Installation
  • Python – NumPy
  • Python for Data Science and AI
  • Python Language Essentials
  • Python Libraries – Numpy and Pandas
  • Numpy for Mathematical Computing

More Prerequisites for Deep Learning and AI

  • Pandas for Data Analysis
  • Machine Learning Basic Concepts
  • Normalization
  • Data Set
  • Machine Learning Concepts
  • Regression
  • Logistic Regression
  • SVM – Support Vector Machines
  • Decision Trees
  • Python Libraries for Data Science and AI

Introduction to Neural Networks

  • Creating Module
  • Neural Network Equation
  • Sigmoid Function
  • Multi-layered perception
  • Weights, Biases
  • Activation Functions
  • Gradient Decent or Error function
  • Epoch, Forward & backword propagation
  • What is TensorFlow?
  • TensorFlow code-basics
  • Graph Visualization
  • Constants, Placeholders, Variables

Multi-layered Neural Networks

  • Error Back propagation issues
  • Drop outs

Regularization techniques in Deep Learning

Deep Learning Libraries

  • Tensorflow
  • Keras
  • OpenCV
  • SkImage
  • PIL

Building of Simple Neural Network from Scratch from Simple Equation

  • Training the model

Dual Equation Neural Network

  • TensorFlow
  • Predicting Algorithm

Introduction to Keras API

  • Define Keras
  • How to compose Models in Keras
  • Sequential Composition
  • Functional Composition
  • Predefined Neural Network Layers
  • What is Batch Normalization
  • Saving and Loading a model with Keras
  • Customizing the Training Process
  • Using TensorBoard with Keras
  • Use-Case Implementation with Keras

GPU in Deep Learning

  • Introduction to GPUs and how they differ from CPUs
  • Importance of GPUs in training Deep Learning Networks
  • The GPU constituent with simpler core and concurrent hardware
  • Keras Model Saving and Reusing
  • Deploying Keras with TensorBoard

Keras Cat Vs Dog Modelling

  • Activation Functions in Neural Network

Optimization Techniques

  • Some Examples for Neural Network

Convolutional Neural Networks (CNN)

  • Introduction to CNNs
  • CNNs Application
  • Architecture of a CNN
  • Convolution and Pooling layers in a CNN
  • Understanding and Visualizing a CNN

RNN: Recurrent Neural Networks

  • Introduction to RNN Model
  • Application use cases of RNN
  • Modelling sequences
  • Training RNNs with Backpropagation
  • Long Short-Term memory (LSTM)
  • Recursive Neural Tensor Network Theory
  • Recurrent Neural Network Model

Application of Deep Learning in image recognition, NLP and more

Real world projects in recommender systems and others

Got a question for us? Please mention it in the comments section and we will get back to you.

 

 

 

0 responses on "Long short-term memory (LSTM) RNN in TensorFlow"

Leave a Message

Your email address will not be published. Required fields are marked *