How to add regularizations in TensorFlow?

PythonNeural NetworkTensorflowDeep Learning

Python Problem Overview


I found in many available neural network code implemented using TensorFlow that regularization terms are often implemented by manually adding an additional term to loss value.

My questions are:

  1. Is there a more elegant or recommended way of regularization than doing it manually?

  2. I also find that get_variable has an argument regularizer. How should it be used? According to my observation, if we pass a regularizer to it (such as tf.contrib.layers.l2_regularizer, a tensor representing regularized term will be computed and added to a graph collection named tf.GraphKeys.REGULARIZATOIN_LOSSES. Will that collection be automatically used by TensorFlow (e.g. used by optimizers when training)? Or is it expected that I should use that collection by myself?

Python Solutions


Solution 1 - Python

As you say in the second point, using the regularizer argument is the recommended way. You can use it in get_variable, or set it once in your variable_scope and have all your variables regularized.

The losses are collected in the graph, and you need to manually add them to your cost function like this.

  reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
  reg_constant = 0.01  # Choose an appropriate one.
  loss = my_normal_loss + reg_constant * sum(reg_losses)

Hope that helps!

Solution 2 - Python

A few aspects of the existing answer were not immediately clear to me, so here is a step-by-step guide:

  1. Define a regularizer. This is where the regularization constant can be set, e.g.:

     regularizer = tf.contrib.layers.l2_regularizer(scale=0.1)
    
  2. Create variables via:

         weights = tf.get_variable(
             name="weights",
             regularizer=regularizer,
             ...
         )
    

    Equivalently, variables can be created via the regular weights = tf.Variable(...) constructor, followed by tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, weights).

  3. Define some loss term and add the regularization term:

     reg_variables = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
     reg_term = tf.contrib.layers.apply_regularization(regularizer, reg_variables)
     loss += reg_term
    

    Note: It looks like tf.contrib.layers.apply_regularization is implemented as an AddN, so more or less equivalent to sum(reg_variables).

Solution 3 - Python

I'll provide a simple correct answer since I didn't find one. You need two simple steps, the rest is done by tensorflow magic:

  1. Add regularizers when creating variables or layers:

    tf.layers.dense(x, kernel_regularizer=tf.contrib.layers.l2_regularizer(0.001))
    # or
    tf.get_variable('a', regularizer=tf.contrib.layers.l2_regularizer(0.001))
    
  2. Add the regularization term when defining loss:

    loss = ordinary_loss + tf.losses.get_regularization_loss()
    

Solution 4 - Python

Another option to do this with the contrib.learn library is as follows, based on the Deep MNIST tutorial on the Tensorflow website. First, assuming you've imported the relevant libraries (such as import tensorflow.contrib.layers as layers), you can define a network in a separate method:

def easier_network(x, reg):
    """ A network based on tf.contrib.learn, with input `x`. """
    with tf.variable_scope('EasyNet'):
        out = layers.flatten(x)
        out = layers.fully_connected(out, 
                num_outputs=200,
                weights_initializer = layers.xavier_initializer(uniform=True),
                weights_regularizer = layers.l2_regularizer(scale=reg),
                activation_fn = tf.nn.tanh)
        out = layers.fully_connected(out, 
                num_outputs=200,
                weights_initializer = layers.xavier_initializer(uniform=True),
                weights_regularizer = layers.l2_regularizer(scale=reg),
                activation_fn = tf.nn.tanh)
        out = layers.fully_connected(out, 
                num_outputs=10, # Because there are ten digits!
                weights_initializer = layers.xavier_initializer(uniform=True),
                weights_regularizer = layers.l2_regularizer(scale=reg),
                activation_fn = None)
        return out 

Then, in a main method, you can use the following code snippet:

def main(_):
    mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
    x = tf.placeholder(tf.float32, [None, 784])
    y_ = tf.placeholder(tf.float32, [None, 10])

    # Make a network with regularization
    y_conv = easier_network(x, FLAGS.regu)
    weights = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'EasyNet') 
    print("")
    for w in weights:
        shp = w.get_shape().as_list()
        print("- {} shape:{} size:{}".format(w.name, shp, np.prod(shp)))
    print("")
    reg_ws = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES, 'EasyNet')
    for w in reg_ws:
        shp = w.get_shape().as_list()
        print("- {} shape:{} size:{}".format(w.name, shp, np.prod(shp)))
    print("")

    # Make the loss function `loss_fn` with regularization.
    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
    loss_fn = cross_entropy + tf.reduce_sum(reg_ws)
    train_step = tf.train.AdamOptimizer(1e-4).minimize(loss_fn)

To get this to work you need to follow the MNIST tutorial I linked to earlier and import the relevant libraries, but it's a nice exercise to learn TensorFlow and it's easy to see how the regularization affects the output. If you apply a regularization as an argument, you can see the following:

- EasyNet/fully_connected/weights:0 shape:[784, 200] size:156800
- EasyNet/fully_connected/biases:0 shape:[200] size:200
- EasyNet/fully_connected_1/weights:0 shape:[200, 200] size:40000
- EasyNet/fully_connected_1/biases:0 shape:[200] size:200
- EasyNet/fully_connected_2/weights:0 shape:[200, 10] size:2000
- EasyNet/fully_connected_2/biases:0 shape:[10] size:10

- EasyNet/fully_connected/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0
- EasyNet/fully_connected_1/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0
- EasyNet/fully_connected_2/kernel/Regularizer/l2_regularizer:0 shape:[] size:1.0

Notice that the regularization portion gives you three items, based on the items available.

With regularizations of 0, 0.0001, 0.01, and 1.0, I get test accuracy values of 0.9468, 0.9476, 0.9183, and 0.1135, respectively, showing the dangers of high regularization terms.

Solution 5 - Python

If anyone's still looking, I'd just like to add on that in tf.keras you may add weight regularization by passing them as arguments in your layers. An example of adding L2 regularization taken wholesale from the Tensorflow Keras Tutorials site:

model = keras.models.Sequential([
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation=tf.nn.relu),
    keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

There's no need to manually add in the regularization losses with this method as far as I know.

Reference: https://www.tensorflow.org/tutorials/keras/overfit_and_underfit#add_weight_regularization

Solution 6 - Python

I tested tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) and tf.losses.get_regularization_loss() with one l2_regularizer in the graph, and found that they return the same value. By observing the value's quantity, I guess reg_constant has already make sense on the value by setting the parameter of tf.contrib.layers.l2_regularizer.

Solution 7 - Python

If you have CNN you may do the following:

In your model function:

conv = tf.layers.conv2d(inputs=input_layer,
			            filters=32,
			            kernel_size=[3, 3],
			            kernel_initializer='xavier',
			            kernel_regularizer=tf.contrib.layers.l2_regularizer(1e-5),
			            padding="same",
			            activation=None) 
...

In your loss function:

onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=num_classes)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)
regularization_losses = tf.losses.get_regularization_losses()
loss = tf.add_n([loss] + regularization_losses)

Solution 8 - Python

Some answers make me more confused.Here I give two methods to make it clearly.

#1.adding all regs by hand
var1 = tf.get_variable(name='v1',shape=[1],dtype=tf.float32)
var2 = tf.Variable(name='v2',initial_value=1.0,dtype=tf.float32)
regularizer = tf.contrib.layers.l1_regularizer(0.1)
reg_term = tf.contrib.layers.apply_regularization(regularizer,[var1,var2])
#here reg_term is a scalar

#2.auto added and read,but using get_variable
with tf.variable_scope('x',
        regularizer=tf.contrib.layers.l2_regularizer(0.1)):
    var1 = tf.get_variable(name='v1',shape=[1],dtype=tf.float32)
    var2 = tf.get_variable(name='v2',shape=[1],dtype=tf.float32)
reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
#here reg_losses is a list,should be summed 

Then,it can be added into the total loss

Solution 9 - Python

cross_entropy = tf.losses.softmax_cross_entropy(
  logits=logits, onehot_labels=labels)

l2_loss = weight_decay * tf.add_n(
     [tf.nn.l2_loss(tf.cast(v, tf.float32)) for v in tf.trainable_variables()])
  
loss = cross_entropy + l2_loss

Solution 10 - Python

tf.GraphKeys.REGULARIZATION_LOSSES will not be added automatically, but there is a simple way to add them:

reg_loss = tf.losses.get_regularization_loss()
total_loss = loss + reg_loss

tf.losses.get_regularization_loss() uses tf.add_n to sum the entries of tf.GraphKeys.REGULARIZATION_LOSSES element-wise. tf.GraphKeys.REGULARIZATION_LOSSES will typically be a list of scalars, calculated using regularizer functions. It gets entries from calls to tf.get_variable that have the regularizer parameter specified. You can also add to that collection manually. That would be useful when using tf.Variable and also when specifying activity regularizers or other custom regularizers. For instance:

#This will add an activity regularizer on y to the regloss collection
regularizer = tf.contrib.layers.l2_regularizer(0.1)
y = tf.nn.sigmoid(x)
act_reg = regularizer(y)
tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, act_reg)

(In this example it would presumably be more effective to regularize x, as y really flattens out for large x.)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLifu HuangView Question on Stackoverflow
Solution 1 - PythonLukasz KaiserView Answer on Stackoverflow
Solution 2 - Pythonbluenote10View Answer on Stackoverflow
Solution 3 - PythonalyaxeyView Answer on Stackoverflow
Solution 4 - PythonComputerScientistView Answer on Stackoverflow
Solution 5 - PythonevantkchongView Answer on Stackoverflow
Solution 6 - PythonoceanView Answer on Stackoverflow
Solution 7 - Pythontsveti_ikoView Answer on Stackoverflow
Solution 8 - Pythonuser3201329View Answer on Stackoverflow
Solution 9 - PythonAlex-zhaiView Answer on Stackoverflow
Solution 10 - PythonElias HasleView Answer on Stackoverflow