How to get stable results with TensorFlow, setting random seed

Python Problem Overview

I'm trying to run a neural network multiple times with different parameters in order to calibrate the networks parameters (dropout probabilities, learning rate e.d.). However I am having the problem that running the network while keeping the parameters the same still gives me a different solution when I run the network in a loop as follows:

filename = create_results_file()
for i in range(3):
  g = tf.Graph()
  with g.as_default():
    accuracy_result, average_error = network.train_network(
        parameters, inputHeight, inputWidth, inputChannels, outputClasses)
    f, w = get_csv_writer(filename)
    w.writerow([accuracy_result, "did run %d" % i, average_error])
    f.close()

I am using the following code at the start of my train_network function before setting up the layers and error function of my network:

np.random.seed(1)
tf.set_random_seed(1)

I have also tried adding this code before the TensorFlow graph creation, but I keep getting different solutions in my results output.

I am using an AdamOptimizer and am initializing network weights using tf.truncated_normal. Additionally I am using np.random.permutation to shuffle the incoming images for each epoch.

Python Solutions

Solution 1 - Python

Setting the current TensorFlow random seed affects the current default graph only. Since you are creating a new graph for your training and setting it as default (with g.as_default():), you must set the random seed within the scope of that with block.

For example, your loop should look like the following:

for i in range(3):
  g = tf.Graph()
  with g.as_default():
    tf.set_random_seed(1)
    accuracy_result, average_error = network.train_network(
        parameters, inputHeight, inputWidth, inputChannels, outputClasses)

Note that this will use the same random seed for each iteration of the outer for loop. If you want to use a different—but still deterministic—seed in each iteration, you can use tf.set_random_seed(i + 1).

Solution 2 - Python

Deterministic behaviour can be obtained either by supplying a graph-level or an operation-level seed. Both worked for me. A graph-level seed can be placed with tf.set_random_seed. An operation-level seed can be placed e.g, in a variable intializer as in:

myvar = tf.Variable(tf.truncated_normal(((10,10)), stddev=0.1, seed=0))

Solution 3 - Python

Tensorflow 2.0 Compatible Answer: For Tensorflow version greater than 2.0, if we want to set the Global Random Seed, the Command used is tf.random.set_seed.

If we are migrating from Tensorflow Version 1.x to 2.x, we can use the command, tf.compat.v2.random.set_seed.

Note that tf.function acts like a re-run of a program in this case.

To set the Operation Level Seed (as answered above), we can use the command, tf.random.uniform([1], seed=1).

For more details, refer this Tensorflow Page.

Solution 4 - Python

Backend Setup: cuda:10.1, cudnn: 7, tensorflow-gpu: 2.1.0, keras: 2.2.4-tf, and vgg19 customized model

After looking into the issue of unstable results for tensorflow backend with GPU training and large neural network models based on keras, I was finally able to get reproducible (stable) results as follows:

Import only those libraries that would be required for setting seed and initialize a seed value

import tensorflow as tf
import os
import numpy as np
import random

SEED = 0

Function to initialize seeds for all libraries which might have stochastic behavior

def set_seeds(seed=SEED):
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    tf.random.set_seed(seed)
    np.random.seed(seed)

Activate Tensorflow deterministic behavior

def set_global_determinism(seed=SEED):
    set_seeds(seed=seed)

    os.environ['TF_DETERMINISTIC_OPS'] = '1'
    os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
    
    tf.config.threading.set_inter_op_parallelism_threads(1)
    tf.config.threading.set_intra_op_parallelism_threads(1)

# Call the above function with seed value
set_global_determinism(seed=SEED)

Important notes:

Please call the above code before executing any other code
Model training might become slower since the code is deterministic, hence there's a tradeoff
I experimented several times with a varying number of epochs and different settings (including model.fit() with shuffle=True) and the above code gives me reproducible results.

References:

Solution 5 - Python

Please add all random seed functions before your code:

tf.reset_default_graph()
tf.random.set_seed(0)
random.seed(0)
np.random.seed(0)

I think, some models in TensorFlow are using numpy or the python random function.

Solution 6 - Python

It seems as if none of these answers will work due to underlying implementation issues in CuDNN.

You can get a bit more determinism by adding an extra flag

os.environ['PYTHONHASHSEED']=str(SEED)
os.environ['TF_CUDNN_DETERMINISTIC'] = '1'  # new flag present in tf 2.0+
random.seed(SEED)
np.random.seed(SEED)
tf.set_random_seed(SEED)

But this still won't be entirely deterministic. To get an even more exact solution, you need use the procedure outlined in this nvidia repo.

Solution 7 - Python

I'm using TensorFlow 2 (2.2.0) and I'm running code in JupyterLab. I've tested this in macOS Catalina and in Google Colab with same results. I'll add some stuff to Tensorflow Support's answer.

When I do some training using the model.fit() method I do it in a cell. I do some other stuff in other cells. This is the code I run in the mentioned cell:

# To have same results always place this on top of the cell
tf.random.set_seed(1)

(x_train, y_train), (x_test, y_test) = load_data_onehot_grayscale()
model = get_mlp_model_compiled() # Creates the model, compiles it and returns it

history = model.fit(x=x_train, y=y_train,
                    epochs=30,
                    callbacks=get_mlp_model_callbacks(),
                    validation_split=.1,
                   )

This is what I understand:

TensorFlow has some random processes that happen at different stages (initializing, shuffling, ...), every time those processes happen TensorFlow uses a random function. When you set the seed using tf.random.set_seed(1) you make those processes use it and if the seed is set and the processes don't change the results will be the same.
Now, in the code above, if I change tf.random.set_seed(1) to go below the line model = get_mlp_model_compiled() my results change, I believe it's because get_mlp_model_compiled() uses randomness and isn't using the seed I want.
Caveat about point 2: if I run the cell 3 times in a row I do get same results. I believe this happens because, in run nº1 get_mlp_model_compiled() isn't using TensorFlow's internal counter with my seed. In run nº2 it will be using a seed and all subsequent runs it will be using the seed too so after run nº2 results will be the same.

I might have some information wrong so feel free to correct me.

To understand what's going on you should read the docs, they're not so long and kind of easy to understand.

Solution 8 - Python

This answer is an addition to Luke's answer and for TF v2.2.0

import numpy as np
import os
import random
import tensorflow as tf # 2.2.0

SEED = 42
os.environ['PYTHONHASHSEED']=str(SEED)
os.environ['TF_CUDNN_DETERMINISTIC'] = '1'  # TF 2.1
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

Content Type	Original Author	Original Content on Stackoverflow
Question	Waanders	View Question on Stackoverflow
Solution 1 - Python	mrry	View Answer on Stackoverflow
Solution 2 - Python	ssegvic	View Answer on Stackoverflow
Solution 3 - Python	Tensorflow Support	View Answer on Stackoverflow
Solution 4 - Python	Dan	View Answer on Stackoverflow
Solution 5 - Python	Zhihui Shao	View Answer on Stackoverflow
Solution 6 - Python	Luke	View Answer on Stackoverflow
Solution 7 - Python	loco.loop	View Answer on Stackoverflow
Solution 8 - Python	Harsh	View Answer on Stackoverflow