What is the default kernel initializer in tf.layers.conv2d and tf.layers.dense?

Tensorflow

Tensorflow Problem Overview


The official Tensorflow API doc claims that the parameter kernel_initializer defaults to None for tf.layers.conv2d and tf.layers.dense.

However, reading the layers tutorial (https://www.tensorflow.org/tutorials/layers), I noted that this parameter is not set in the code. For example:

# Convolutional Layer #1
conv1 = tf.layers.conv2d(
    inputs=input_layer,
    filters=32,
    kernel_size=[5, 5],
    padding="same",
    activation=tf.nn.relu)

The example code from the tutorial runs without any errors, so I think the default kernel_initializer is not None. So, which initializer is used?

In another code, I did not set the kernel_initializer of the conv2d and dense layers, and everything was fine. However, when I tried to set the kernel_initializer to tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32), I got NaN errors. What is going on here? Can anyone help?

Tensorflow Solutions


Solution 1 - Tensorflow

Great question! It is quite a trick to find out!

  • As you can see, it is not documented in tf.layers.conv2d
  • If you look at the definition of the function you see that the function calls variable_scope.get_variable:

In code:

self.kernel = vs.get_variable('kernel',
                                  shape=kernel_shape,
                                  initializer=self.kernel_initializer,
                                  regularizer=self.kernel_regularizer,
                                  trainable=True,
                                  dtype=self.dtype)

Next step: what does the variable scope do when the initializer is None?

Here it says:

> If initializer is None (the default), the default initializer passed in the constructor is used. If that one is None too, we use a new glorot_uniform_initializer.

So the answer is: it uses the glorot_uniform_initializer

For completeness the definition of this initializer:

> The Glorot uniform initializer, also called Xavier uniform initializer. It draws samples from a uniform distribution within [-limit, limit] where limit is sqrt(6 / (fan_in + fan_out)) where fan_in is the number of input units in the weight tensor and fan_out is the number of output units in the weight tensor. Reference: http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf

Edit: this is what I found in the code and documentation. Perhaps you could verify that the initialization looks like this by running eval on the weights!

Solution 2 - Tensorflow

According to this course by Andrew Ng and the Xavier documentation, if you are using ReLU as activation function, better change the default weights initializer(which is Xavier uniform) to Xavier normal by:

y = tf.layers.conv2d(x, kernel_initializer=tf.contrib.layers.xavier_initializer(uniform=False), )

Solution 3 - Tensorflow

2.0 Compatible Answer: Even in Tensorflow 2.0, the Default Kernel Initializer in tf.keras.layers.Conv2D and tf.keras.layers.Dense is glorot_uniform.

This is specified in the Tensorflow.org Website.

Link for Conv2D is https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D?version=nightly#__init__

and the Link for Dense is

https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense?version=nightly#__init__

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondaniszwView Question on Stackoverflow
Solution 1 - TensorflowrmeertensView Answer on Stackoverflow
Solution 2 - TensorflowxtluoView Answer on Stackoverflow
Solution 3 - TensorflowTensorflow SupportView Answer on Stackoverflow