Debugging in TensorFlow

Last updated on Nov 01 2021
Goutam Joseph

Table of Contents

Debugging in TensorFlow

Debugging is a difficult and challenging task. We have to written code and identifying problems through tensorflow debugging. Typically, there are many guides, and the process of debugging is often well documented for many languages and frameworks.

TensorFlow has its debugger called the tfdbg TensorFlow Debugging, which lets us observe the essential working and the state of the running graph. These are difficult to debug with any debuggers like pdb in python.

This blog will deal with teaching us how to use the tfdbg CLI to debug the appearance of nans and icons, which are the most common types of bugs found in the tensor flow. Given below is a low-level API example.

Python-m tensorflow.python.debug.examples.debug_mnist

The code is given above train a neural network for MNIST digit image recognition, and accuracy increases before saturating after several steps.

This error can the infs and nans, which have the most common bugs. Now use tfdbg to debug the issue and know where exactly the problem starts.

Wrapping TensorFlow sessions with tfdbg

Add the below line of code to use tfdbg and contain the session object using the debugger wrapper.

  1. From tensorflow.python importdebug as tf_debug
  1. sess= tf_debug.LocalCLIDebugWrapperSession (sess)

The wrapper class offers some added features, which include: CLI can be called before and after Session. run () if we wish to take control of the execution and know the internal state of the graph.

Filters can be added to assist the diagnosis. In the provided example, there is a filter called tfdbg.has_inf_or_nan, which determines the presence of Nan or inf in any in-between tensors, which are neither inputs nor outputs.

We are always free to write our code for custom filters that suit our needs, and we can look at the API documentation for additional information for the same.

Debugging TensorFlow Model Training with tfdbg,

It’s time to train the model with including the-debug flag:

  1. python-m
  1. tensorflow.python.debug.examples.debug_mnist-debug

The fetched data can be displayed on the screen and look like the image shown below:

Page 2 Image 1 2
fetched data

The above picture is the run-start interface. After this, enter the r at the prompt:

tfdbg>run

This will make the TensorFlow debugger run if the next session call, calculate the accuracy for the test dataset.

For example:

Page 3 Image 2 3
tfdbgrun

We can list the tensors using command after we’ve executed run.

Frequently-Used TensorFlow Debugging Commands

See the following commands at the tfdbg> prompt: Note that whenever we enter a command, a brand new display output will be seen. That is analogous to internet pages in the browser. We may navigate between those screens by clicking the <-and-> text arrows close to the top-left corner of the CLI.

Features of tfdbg CLI

Similarly, like TensorFlow debugging commands indexed above, the tfdbg CLI gives the subsequent additional capabilities:

To navigate through preceding tfdbg instructions that commenced with the one’s characters.

To navigate through preceding tfdbg instructions, type in some characters accompanied by the up or down arrow keys. Tfdbg will show us the history of instructions that commenced with one’s attitudes.

To navigate through the records of screen outputs, do both of the following:

Click underlined <-and-> hyperlinks close to the pinnacle left the corner of the display screen. To redirect the display screen output to a record in preference to the screen, quit the command redirects the output of the pt command to the

/tmp/xent_value_slices.txtfile:
tfdbg> pt crosses_entropy/Log:0[:, 0:10] > /tmp/xnt_value_slices.txt

Finding nans and infs

On this first consultation Run () call, It takes place to be no intricate numerical values. We can move to follow run by the usage of the command run or its shorthand r.

We can additionally use the -t flag to transport before and some of the Session. Run () calls at a time,

Run() calls without stopping on the run-start or run-stop activate, till the primary Nan or inf value in the graph. That is analogous to conditional breakpoints in the procedural-language debugger in tensorflow:

tfdbg> run -f has_inf_or_nan
The preceding command works good because a tensor clears out known as has_inf_or_nan has been registered for use when the wrapped consultation is created.
def my_filter_callable (datum, tensor):
return len(tensor. Shape) == 0 and tensor == 0
add_tensor_filter('my_filter', my_filters_callable)
The tfdbg run-start prompt Run until our filter is precipitated:
tfdbg> run -f my_filter

See the API document for more statistics at the expected signature and go back to a value of the predicate callable used with add_tensor_filter().

Page 4 Image 3 3
add tensor filter

Because the display suggests on the primary line, the has_inf_or_nan filter out is first brought about the fourth consultation.

Run() call: The Adam optimizer ahead-backward education skip the graph. In this Run, 36 intermediate tensors incorporate nan or inf values.

tfdbg>pt dross_entropy/Log:0
Scroll down a touch, and we'll word some scattered inf values. If the instance of inf and Nan are hard to identify by eye, we may use the following command to perform a regex seek and spotlight the output:
tfdbg>/inf
Or, as a substitute:
tfdbg>/(inf|nan)
We can additionally use the-s or -numeric_summary command to get a summary of the sorts of numeric values within the tensor:
tfdbg>pt -s cross_entropy/Log:0
We can see that several of the thousand elements of the cross_entropy/Log: zero tensors are -infs (negative infinities).
tfdbg> ni cross_entropy/Log
Page 5 Image 4 3
infs

We see that this node has the op type log and that its input is the node softmax. Then run the subsequent command to take a more in-depth observe the input tensor:

tfdbg> pt Softmax:0

Take a look at the values in the enter tensor, looking for zeroes:

tfdbg>/0\.000

Now it is clear that the foundation of the terrible numerical values is the node cross-entropy/Log talking logs of zeros.

To find out the wrong line within the python supply code, use the -t flag of the ni command to show the traceback of the node’s production:

tfdbg>ni -t cross_entropy/Log

In case we click “node_info” at the top of the display, tfdbg mechanically suggests the traceback of the node’s creation.

From the traceback, we see that the op is built at the following line: debug_minist.Py:

  1. Diff=y_*tf.log(y)

It can annotate lines of a python record with the pops or tensors crested with the aid of them.

Page 6 Image 5 3
pythonrecord

Fixing Problem in TensorFlow Debugging

To restore the problem, edit debug_mnist.Py, changing the unique line:

diff =-(y_*tf.log(y))
Numerical-stable implementation of move-entropy:
diff= tf.losses.sparse_softmax_cross_entropy(label=y_,logits=lgits)
Rerun with -debug flag as given below:
python -m tensorflow.python.debug.example.debug_mnist --debug
At the tfdbg> prompt, enter the below given command:
run -f has_inf_or_nan

Declare that no tensors are flagged as containing nan or inf values, and accuracy to maintains and prevent to getting stuck.

Debugging tf-learn Estimators and Experiments

An experiment is an assembly in the tf.Contrib.Examine at a better degree than the Estimator. It gives a single interface for education and comparing it to the model. To debugging the teach () and evaluate () calls to a test object, we can use the keyword arguments train_monitors and eval_hooks when calling its constructor.

Example:

From tensorflow.python import debug as the tf_debug

hooks = [tf_debug.LocalCLIDebug()]
ex = experiment.Experiment(classifier,
eval_input_fn=iris_input_fn,
train_input_fn=iris_input_fn,
train_steps=FLAGS.train_steps,
eval_delay_secs=0,
eval_steps=1,
train_monitors=hooks,
eval_hooks=hooks)
ex.train()
accuracy_score = ex.evaluate()["accuracy"]
For building and running the debug_tflearn_iris example inside the experiment mode:
python -m tensorflow.python.debug.example.debug_tflearn_iris\
--use_experiment --debug

The LocalCLIDebugHook also allows us to configure a watch_fn that can used to flexibly specify the Tensors to watch on one of a session.Run() calls, like a characteristic of the fetches and feed_dict the different states.

Debugging Keras Models with the help of tfdbg

To use the TFDBG with Keras, allow the Keras backend to use a TFDBG-wrapped consultation item. To use the CLI wrapper in the debugging process:

importtensorflow as tf
from tensorflow.python importdebug as tf_debug
from Keras import backend as keras_backend
set_session(tf_debug.LocalCLIDebugWrapperSession(tf.Session()))
# Defining the Keras model below.
fit(...) # This will breakinto TFDBG CLI.

Debugging tf-slim with tfdbg

TFDBG supports the debugging of training and evaluation with tf-slim. Instruction and assessment require slightly different debugging workflows.

Debugging training with tf-slim

TFDBG supports TensorFlow debugging of training with the help of tf-slender. Training and evaluation make slightly special TensorFlow debugging workflows to work.

importtensorflow as tf
from tensorflow.python import debug as tf_debug
#. Code forcreating the graph.
tf.contrib.slim.learning.train(
train_op,
logdir,
number_of_steps=10,
session_wrapper=tf_debug.LocalCLIDebugWrapper)

Debugging evaluation

To debugging the schooling system, offer LocalCLIDebugWrapperSession to the session_wrapper argument of slender.Mastering.Educate().

importtensorflow as tf
# .Code which creates the graph and the eval
contrib.slim.evaluation.evaluate_once(
'',
checkpoint_path,
logdir,
eval_op=my_eval_op,
final_op=my_value_op,
hooks=[tf_debug.LocalCLIDebugHook()])

Offline Debugging of Remotely-Running Sessions

To perform version TensorFlow debugging in the instances, we may use the offline_analyzer binary of tfdbg. It operates on dumped facts directories. This is done to both the lower-level session API and the better-degree Estimator and test APIs.

Debugging Remote tf.Sessions

In case we have interaction without delay with the tf. Session API in python, we can configure the RunOptions proto that we call your session.Run() technique with, by the usage of the approach tfdbg.

For instance:
import debug as tf_debug
# ... Code where the session and graph
run_options = tf.RunOptions()
watch_graph(
run_options,
graph,
session.run(fetches, feed_dict=feeds, options=run_options)
# specify different directories formany run() calls.
debug_urls=["file:///shared/storage/location/tfdb_dumps_1"])

In surroundings that we have terminal access to (as an instance, a nearby laptop that can get admission to the shared garage location exact within the code above), we can load and inspect the records in the selloff directory at the shared storage by way of the offline_analyzer binary of tfdbg.

Example:

Explore Tensorflow Architecture and Important Terms

python -m tensorflow.python.debug.cli.offline_analyzer \
--dump_dir=/shared/storage/location/tfdb_dumps_1

The session gives an easier manner to generate document-system dumps that may be analyzed offline. To apply it, wrap our consultation in a tf_debug.DumpingDebugWrapperSession.

Example:

importdebug as tf_debug
sess = tf_debug.DumpingDebugWrapperSession(
sess, "/shared/storage/location/tfdbg_dumps_1/",watch_fn=my_watch_fn)

The watch_fn argument accept a Callable that permits us to configure the tensors to observe the distinct consultation.Run() calls, like a function of the fetches and feed_dict to run() name and states.

C++ and other languages

If our version code is written in C++ or other words, we can additionally modify the debug_options subject of RunOptionsto to debug dumps that can be inspected offline. See the proto definition for extra information.

Debugging Remotely-Running tf-learn Estimators and Experiments

We could use the non-interactive DumpingDebugHook.

importdebug as tf_debug
hooks = [tf_debug.DumpingDebugHook("/shared/storage/location/tfdbg_dumps_1")]

Then this hook can be used in the same manner because the LocalCLIDebugHook examples described earlier on this file. As the evaluation of estimator or experiment takes place, tfdbg creates directories having the following call sample: /shared/garage/place/tfdbg_dumps_1/run__. Each listing corresponding to a session. Run() name that underlie the suit() or compare() call. We can load the directories and inspect them in a command-line interface in an offline way the usage of the offline_analyzer supplied by tfdbg.

python -m tensorflow.python.debug.cli.offlineanalyzer \
dump_dir="/shared/storage/location/tfdbg_dumps_1/run_<epoch_timestamp_microsec>_<uuid>"

So, this brings us to the end of blog. This Tecklearn ‘Debugging in Tensor Flow’ blog helps you with commonly asked questions if you are looking out for a job in Artificial Intelligence. If you wish to learn Artificial Intelligence and build a career in AI or Machine Learning domain, then check out our interactive, Artificial Intelligence and Deep Learning with TensorFlow Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/artificial-intelligence-and-deep-learning-with-tensorflow/

Artificial Intelligence and Deep Learning with TensorFlow Training

About the Course

Tecklearn’s Artificial Intelligence and Deep Learning with Tensor Flow course is curated by industry professionals as per the industry requirements & demands and aligned with the latest best practices. You’ll master convolutional neural networks (CNN), TensorFlow, TensorFlow code, transfer learning, graph visualization, recurrent neural networks (RNN), Deep Learning libraries, GPU in Deep Learning, Keras and TFLearn APIs, backpropagation, and hyperparameters via hands-on projects. The trainee will learn AI by mastering natural language processing, deep neural networks, predictive analytics, reinforcement learning, and more programming languages needed to shine in this field.

Why Should you take Artificial Intelligence and Deep Learning with Tensor Flow Training?

  • According to Paysa.com, an Artificial Intelligence Engineer earns an average of $171,715, ranging from $124,542 at the 25th percentile to $201,853 at the 75th percentile, with top earners earning more than $257,530.
  • Worldwide Spending on Artificial Intelligence Systems Will Be Nearly $98 Billion in 2023, According to New IDC Spending Guide at a GACR of 28.5%.
  • IBM, Amazon, Apple, Google, Facebook, Microsoft, Oracle and almost all the leading companies are working on Artificial Intelligence to innovate future technologies.

What you will Learn in this Course?

Introduction to Deep Learning and AI

  • What is Deep Learning?
  • Advantage of Deep Learning over Machine learning
  • Real-Life use cases of Deep Learning
  • Review of Machine Learning: Regression, Classification, Clustering, Reinforcement Learning, Underfitting and Overfitting, Optimization
  • Pre-requisites for AI & DL
  • Python Programming Language
  • Installation & IDE

Environment Set Up and Essentials

  • Installation
  • Python – NumPy
  • Python for Data Science and AI
  • Python Language Essentials
  • Python Libraries – Numpy and Pandas
  • Numpy for Mathematical Computing

More Prerequisites for Deep Learning and AI

  • Pandas for Data Analysis
  • Machine Learning Basic Concepts
  • Normalization
  • Data Set
  • Machine Learning Concepts
  • Regression
  • Logistic Regression
  • SVM – Support Vector Machines
  • Decision Trees
  • Python Libraries for Data Science and AI

Introduction to Neural Networks

  • Creating Module
  • Neural Network Equation
  • Sigmoid Function
  • Multi-layered perception
  • Weights, Biases
  • Activation Functions
  • Gradient Decent or Error function
  • Epoch, Forward & backword propagation
  • What is TensorFlow?
  • TensorFlow code-basics
  • Graph Visualization
  • Constants, Placeholders, Variables

Multi-layered Neural Networks

  • Error Back propagation issues
  • Drop outs

Regularization techniques in Deep Learning

Deep Learning Libraries

  • Tensorflow
  • Keras
  • OpenCV
  • SkImage
  • PIL

Building of Simple Neural Network from Scratch from Simple Equation

  • Training the model

Dual Equation Neural Network

  • TensorFlow
  • Predicting Algorithm

Introduction to Keras API

  • Define Keras
  • How to compose Models in Keras
  • Sequential Composition
  • Functional Composition
  • Predefined Neural Network Layers
  • What is Batch Normalization
  • Saving and Loading a model with Keras
  • Customizing the Training Process
  • Using TensorBoard with Keras
  • Use-Case Implementation with Keras

GPU in Deep Learning

  • Introduction to GPUs and how they differ from CPUs
  • Importance of GPUs in training Deep Learning Networks
  • The GPU constituent with simpler core and concurrent hardware
  • Keras Model Saving and Reusing
  • Deploying Keras with TensorBoard

Keras Cat Vs Dog Modelling

  • Activation Functions in Neural Network

Optimization Techniques

  • Some Examples for Neural Network

Convolutional Neural Networks (CNN)

  • Introduction to CNNs
  • CNNs Application
  • Architecture of a CNN
  • Convolution and Pooling layers in a CNN
  • Understanding and Visualizing a CNN

RNN: Recurrent Neural Networks

  • Introduction to RNN Model
  • Application use cases of RNN
  • Modelling sequences
  • Training RNNs with Backpropagation
  • Long Short-Term memory (LSTM)
  • Recursive Neural Tensor Network Theory
  • Recurrent Neural Network Model

Application of Deep Learning in image recognition, NLP and more

Real world projects in recommender systems and others

Got a question for us? Please mention it in the comments section and we will get back to you.

 

 

0 responses on "Debugging in TensorFlow"

Leave a Message

Your email address will not be published. Required fields are marked *