How to inspect a Tensorflow .tfrecord file?

PythonTensorflowTfrecord

Python Problem Overview


I have a .tfrecord but I don't know how it is structured. How can I inspect the schema to understand what the .tfrecord file contains?

All Stackoverflow answers or documentation seem to assume I know the structure of the file.

reader = tf.TFRecordReader()
file = tf.train.string_input_producer("record.tfrecord")
_, serialized_record = reader.read(file)

...HOW TO INSPECT serialized_record...

Python Solutions


Solution 1 - Python

Found it!

import tensorflow as tf

for example in tf.python_io.tf_record_iterator("data/foobar.tfrecord"):
    print(tf.train.Example.FromString(example))

You can also add:

from google.protobuf.json_format import MessageToJson
...
jsonMessage = MessageToJson(tf.train.Example.FromString(example))

Solution 2 - Python

Above solutions didn't work for me so for TF 2.0 use this:

import tensorflow as tf 
raw_dataset = tf.data.TFRecordDataset("path-to-file")

for raw_record in raw_dataset.take(1):
    example = tf.train.Example()
    example.ParseFromString(raw_record.numpy())
    print(example)

https://www.tensorflow.org/tutorials/load_data/tfrecord#reading_a_tfrecord_file_2

Solution 3 - Python

If your .tftrecord contains SequenceExample, the accepted answer won't show you everything. You can use:

import tensorflow as tf

for example in tf.python_io.tf_record_iterator("data/foobar.tfrecord"):
    result = tf.train.SequenceExample.FromString(example)
    break
print(result)

This will show you the content of the first example.

Then you can also inspect individual Features using their keys:

result.context.feature["foo_key"]

And for FeatureLists:

result.feature_lists.feature_list["bar_key"]

Solution 4 - Python

Improvement of the accepted solution :

import tensorflow as tf
import json

dataset = tf.data.TFRecordDataset("mydata.tfrecord")
for d in dataset:
    ex = tf.train.Example()
    ex.ParseFromString(d.numpy())
    m = json.loads(MessageToJson(ex))
    print(m['features']['feature'].keys())

In my case, I was running on TF2, and a single example was too big to fit on my screen, so I needed to use a dictionary to inspect the keys (the accepted solution return a full string).

Solution 5 - Python

Use TensorFlow tf.TFRecordReader with the tf.parse_single_example decoder as specified in https://www.tensorflow.org/programmers_guide/reading_data

PS, tfrecord contains 'Example' records defined in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/example.proto

Once you extract the record into a string, parsing it is something like this

a=tf.train.Example()
result = a.ParseFromString(binary_string_with_example_record)

However, I'm not sure where's the raw support for extracting individual records from a file, you can track it down in TFRecordReader

Solution 6 - Python

If it's an option to install another Python package, tfrecord_lite is very convenient.

Example:

In [1]: import tensorflow as tf
   ...: from tfrecord_lite import decode_example
   ...:
   ...: it = tf.python_io.tf_record_iterator('nsynth-test.tfrecord')
   ...: decode_example(next(it))
   ...:
Out[1]:
{'audio': array([ 3.8138387e-06, -3.8721851e-06,  3.9331076e-06, ...,
        -3.6526076e-06,  3.7041993e-06, -3.7578957e-06], dtype=float32),
 'instrument': array([417], dtype=int64),
 'instrument_family': array([0], dtype=int64),
 'instrument_family_str': [b'bass'],
 'instrument_source': array([2], dtype=int64),
 'instrument_source_str': [b'synthetic'],
 'instrument_str': [b'bass_synthetic_033'],
 'note': array([149013], dtype=int64),
 'note_str': [b'bass_synthetic_033-100-100'],
 'pitch': array([100], dtype=int64),
 'qualities': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64),
 'sample_rate': array([16000], dtype=int64),
 'velocity': array([100], dtype=int64)}

You can install it by pip install tfrecord_lite.

Solution 7 - Python

I'd recommend the following script: tfrecord-view.

It enables a convenient visual inspection of TF records using TF and openCV, although needs a bit of modifications (for labels and such). See further instructions inside the repository

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBob van LuijtView Question on Stackoverflow
Solution 1 - PythonBob van LuijtView Answer on Stackoverflow
Solution 2 - Pythonamalik2205View Answer on Stackoverflow
Solution 3 - PythonrafiView Answer on Stackoverflow
Solution 4 - PythonAstariulView Answer on Stackoverflow
Solution 5 - PythonYaroslav BulatovView Answer on Stackoverflow
Solution 6 - PythonKeunwoo ChoiView Answer on Stackoverflow
Solution 7 - PythonM_N1View Answer on Stackoverflow