What's the difference between sparse_softmax_cross_entropy_with_logits and softmax_cross_entropy_with_logits?

Neural Network Tensorflow Softmax Cross Entropy

Neural Network Problem Overview

I recently came across tf.nn.sparse_softmax_cross_entropy_with_logits and I can not figure out what the difference is compared to tf.nn.softmax_cross_entropy_with_logits.

Is the only difference that training vectors y have to be one-hot encoded when using sparse_softmax_cross_entropy_with_logits?

Reading the API, I was unable to find any other difference compared to softmax_cross_entropy_with_logits. But why do we need the extra function then?

Shouldn't softmax_cross_entropy_with_logits produce the same results as sparse_softmax_cross_entropy_with_logits, if it is supplied with one-hot encoded training data/vectors?

Neural Network Solutions

Solution 1 - Neural Network

Having two different functions is a convenience, as they produce the same result.

The difference is simple:

For sparse_softmax_cross_entropy_with_logits, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range [0, num_classes-1].
For softmax_cross_entropy_with_logits, labels must have the shape [batch_size, num_classes] and dtype float32 or float64.

Labels used in softmax_cross_entropy_with_logits are the one hot version of labels used in sparse_softmax_cross_entropy_with_logits.

Another tiny difference is that with sparse_softmax_cross_entropy_with_logits, you can give -1 as a label to have loss 0 on this label.

Solution 2 - Neural Network

I would just like to add 2 things to accepted answer that you can also find in TF documentation.

First: > tf.nn.softmax_cross_entropy_with_logits > > NOTE: While the classes are mutually exclusive, their probabilities > need not be. All that is required is that each row of labels is a > valid probability distribution. If they are not, the computation of > the gradient will be incorrect.

Second:

> tf.nn.sparse_softmax_cross_entropy_with_logits > > NOTE: For this operation, the probability of a given label is > considered exclusive. That is, soft classes are not allowed, and the > labels vector must provide a single specific index for the true class > for each row of logits (each minibatch entry).

Solution 3 - Neural Network

Both functions computes the same results and sparse_softmax_cross_entropy_with_logits computes the cross entropy directly on the sparse labels instead of converting them with one-hot encoding.

You can verify this by running the following program:

import tensorflow as tf
from random import randint

dims = 8
pos  = randint(0, dims - 1)

logits = tf.random_uniform([dims], maxval=3, dtype=tf.float32)
labels = tf.one_hot(pos, dims)

res1 = tf.nn.softmax_cross_entropy_with_logits(       logits=logits, labels=labels)
res2 = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=tf.constant(pos))

with tf.Session() as sess:
    a, b = sess.run([res1, res2])
    print a, b
    print a == b

Here I create a random logits vector of length dims and generate one-hot encoded labels (where element in pos is 1 and others are 0).

After that I calculate softmax and sparse softmax and compare their output. Try rerunning it a few times to make sure that it always produce the same output

Content Type	Original Author	Original Content on Stackoverflow
Question	daniel451	View Question on Stackoverflow
Solution 1 - Neural Network	Olivier Moindrot	View Answer on Stackoverflow
Solution 2 - Neural Network	Drag0	View Answer on Stackoverflow
Solution 3 - Neural Network	Salvador Dali	View Answer on Stackoverflow

What's the difference between sparse_softmax_cross_entropy_with_logits and softmax_cross_entropy_with_logits?

Neural Network Problem Overview

Neural Network Solutions

Solution 1 - Neural Network

Solution 2 - Neural Network

Solution 3 - Neural Network

How to enable live reload in react native on android?

Firebase FCM force onTokenRefresh() to be called

Attributions