Tensorflow: How to replace or modify gradient?

Python Problem Overview

I would like to replace or modify the gradient of an op or portion of the graph in tensorflow. It would be ideal if I can use the existing gradient in the calculation.

In some ways this is the opposite to what tf.stop_gradient() does: instead of adding a calculation which is ignored when calculating gradients, I want a calculation which is only used when calculating gradients.

A simple example would be something which simply scales gradients by multiplying them with a constant (but does not multiply the forward calculation by a constant). Another example would be something which clips the gradients to a given range.

Python Solutions

Solution 1 - Python

For TensorFlow 1.7 and TensorFlow 2.0 look at edit blow.

First define your custom gradient:

@tf.RegisterGradient("CustomGrad")
def _const_mul_grad(unused_op, grad):
  return 5.0 * grad

Since you want nothing to happen in the forward pass, override the gradient of an identity operation with your new gradient:

g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "CustomGrad"}):
  output = tf.identity(input, name="Identity")

Here is a working example with a layer that clips gradients in the backwards pass and does nothing in the forwards pass, using the same method:

import tensorflow as tf

@tf.RegisterGradient("CustomClipGrad")
def _clip_grad(unused_op, grad):
  return tf.clip_by_value(grad, -0.1, 0.1)

input = tf.Variable([3.0], dtype=tf.float32)

g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "CustomClipGrad"}):
  output_clip = tf.identity(input, name="Identity")
grad_clip = tf.gradients(output_clip, input)

# output without gradient clipping in the backwards pass for comparison:
output = tf.identity(input)
grad = tf.gradients(output, input)

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  print("with clipping:", sess.run(grad_clip)[0])
  print("without clipping:", sess.run(grad)[0])

Edit for TensorFlow 1.7 and TensorFlow 2.0

Since 1.7 there is a new way to redefine the gradient with shorter syntax, which also works with Tensorflow 2.0. It also allows to redefine the gradient of multiple operations at the same time. Here are the examples from above, rewritten for TensorFlow 1.7 and TensorFlow 2.0:

Layer that scales gradients in the backward pass:

@tf.custom_gradient
def scale_grad_layer(x):
  def grad(dy):
    return 5.0 * dy
  return tf.identity(x), grad

Example with a layer that clips gradients in the backward pass:

@tf.custom_gradient
def clip_grad_layer(x):
  def grad(dy):
    return tf.clip_by_value(dy, -0.1, 0.1)
  return tf.identity(x), grad

Solution 2 - Python

Assuming the forward computation is

y = f(x)

And you want it to backpropagate like

y = b(x)

A simple hack will be:

y = b(x) + tf.stop_gradient(f(x) - b(x))

Solution 3 - Python

use optimizer.compute_gradients or tf.gradient to get original gradients
then do whatever you want
finally, use optimizer.apply_gradients

I found an example from github

Solution 4 - Python

The most general way to do that is by using https://www.tensorflow.org/api_docs/python/tf/RegisterGradient

Below, I implemented backpropagated gradient clipping, which can be used with matmul, as shown here, or any other op:

import tensorflow as tf
import numpy as np

# from https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342
def py_func(func, inp, Tout, stateful=True, name=None, grad=None):
    
    # Need to generate a unique name to avoid duplicates:
    rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))
    
    tf.RegisterGradient(rnd_name)(grad)
    g = tf.get_default_graph()
    with g.gradient_override_map({"PyFunc": rnd_name}):
        return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

def clip_grad(x, clip_value, name=None):
    """"
    scales backpropagated gradient so that
    its L2 norm is no more than `clip_value`
    """
    with tf.name_scope(name, "ClipGrad", [x]) as name:
        return py_func(lambda x : x,
                        [x],
                        [tf.float32],
                        name=name,
                        grad=lambda op, g : tf.clip_by_norm(g, clip_value))[0]

Example usage:

with tf.Session() as sess:
    x = tf.constant([[1., 2.], [3., 4.]])
    y = tf.constant([[1., 2.], [3., 4.]])

    print('without clipping')
    z = tf.matmul(x, y)
    print(tf.gradients(tf.reduce_sum(z), x)[0].eval())

    print('with clipping')
    z = tf.matmul(clip_grad(x, 1.0), clip_grad(y, 0.5))
    print(tf.gradients(tf.reduce_sum(z), x)[0].eval())

    print('with clipping between matmuls')
    z = tf.matmul(clip_grad(tf.matmul(x, y), 1.0), y)
    print(tf.gradients(tf.reduce_sum(z), x)[0].eval())

Output:

without clipping
[[ 3.  7.]
 [ 3.  7.]]
with clipping
[[ 0.278543   0.6499337]
 [ 0.278543   0.6499337]]
with clipping between matmuls
[[ 1.57841039  3.43536377]
 [ 1.57841039  3.43536377]]

Solution 5 - Python

For TensorFlow 2, you should use the tf.custom_gradient decorator as follows:

@tf.custom_gradient
def func(x):
    f = # calculate forward pass
    def grad(dy):
        gradient = # calculate custom gradient of func
        return dy * gradient
    return f, grad

Note that you must multiply gradient by the upstream gradients. Be wary though!

If you call this as a function when creating a Keras functional model and use tf.GradientTape, then automatic differentiation will still take place, and your custom gradient will be ignored.

Instead, you must put your function into a layer:

class func_layer(tf.keras.layers.Layer):
    def __init__(self):
        super(func_layer, self).__init__()

    def call(self, x):
        return func(x)

Now, when you add a func_layer to your functional model, the backward pass will be calculated appropriately.

Solution 6 - Python

For current TensorFlow r1.13, use tf.custom_gradient.

The decorated function (input arguments is a list x) should return

the result of the forward pass, and
a function which returns a list of gradients, one for each element in x.

Here's an example with one variable:

@tf.custom_gradient
def non_differentiable(x):
    f = tf.cast(x > 0, tf.float32)
    def grad(dy):
        return tf.math.maximum(0., 1 - tf.abs(x))
    return f, grad

And one with two:

@tf.custom_gradient
def non_differentiable2(x0, x1):
    f = x0 * tf.cast(x1 > 0, tf.float32)
    def grad(dy):
        df_dx0 = tf.cast(x1 > 0, tf.float32)
        return dy*df_dx0, tf.zeros_like(dy)
    return f, grad

Content Type	Original Author	Original Content on Stackoverflow
Question	Alex I	View Question on Stackoverflow
Solution 1 - Python	BlueSun	View Answer on Stackoverflow
Solution 2 - Python	Bily	View Answer on Stackoverflow
Solution 3 - Python	xxi	View Answer on Stackoverflow
Solution 4 - Python	MWB	View Answer on Stackoverflow
Solution 5 - Python	Alex Trevithick	View Answer on Stackoverflow
Solution 6 - Python	cheersmate	View Answer on Stackoverflow