What does the protobuf text format look like?

SerializationTextProtocol Buffers

Serialization Problem Overview


Google Protocol Buffers can not only be serialized in binary format, also be serialized as text, known as textproto. However I can't easily find examples of such text; what would it look like?

Expected answer: an example covering all features allowed by the protobuf IDL/proto file including a sample protobuf packet in textual form.

Serialization Solutions


Solution 1 - Serialization

Done myself:

test.proto

enum MyEnum
{
    Default = 0;
    Variant1 = 1;
    Variant100 = 100;
}

message Test {
    required string f1 = 1;
    required int64 f2 = 2;
    repeated uint64 fa = 3;
    repeated int32 fb = 4;
    repeated int32 fc = 5 [packed = true];
    repeated Pair pairs = 6;
    optional bytes bbbb = 7;

    extensions 100 to max;
}

message Pair {
    required string key = 1;
    optional string value = 2;
}



extend Test {
    optional bool gtt = 100;
    optional double gtg = 101;
    repeated MyEnum someEnum = 102;
}

example output:

f1: "dsfadsafsaf"
f2: 234
fa: 2342134
fa: 2342135
fa: 2342136
fb: -2342134
fb: -2342135
fb: -2342136
fc: 4
fc: 7
fc: -12
fc: 4
fc: 7
fc: -3
fc: 4
fc: 7
fc: 0
pairs {
  key: "sdfff"
  value: "q\"qq\\q\n"
}
pairs {
  key: "   sdfff2  \321\202\320\265\321\201\321\202 "
  value: "q\tqq<>q2&\001\377"
}
bbbb: "\000\001\002\377\376\375"
[gtt]: true
[gtg]: 20.0855369
[someEnum]: Variant1

the program:

#include <google/protobuf/text_format.h>

#include <stdio.h>
#include "test.pb.h"

int main() {
    Test t;
    t.set_f1("dsfadsafsaf");
    t.set_f2(234);
    t.add_fa(2342134);
    t.add_fa(2342135);
    t.add_fa(2342136);
    t.add_fb(-2342134);
    t.add_fb(-2342135);
    t.add_fb(-2342136);
    t.add_fc(4);
    t.add_fc(7);
    t.add_fc(-12);
    t.add_fc(4);
    t.add_fc(7);
    t.add_fc(-3);
    t.add_fc(4);
    t.add_fc(7);
    t.add_fc(0);
    t.set_bbbb("\x00\x01\x02\xff\xfe\xfd",6);

    Pair *p1 = t.add_pairs(), *p2 = t.add_pairs();
    p1->set_key("sdfff");
    p1->set_value("q\"qq\\q\n");
    p2->set_key("   sdfff2  тест ");
    p2->set_value("q\tqq<>q2&\x01\xff");

    t.SetExtension(gtt, true);
    t.SetExtension(gtg, 20.0855369);
    t.AddExtension(someEnum, Variant1);

    std::string str;
    google::protobuf::TextFormat::PrintToString(t, &str);
    printf("%s", str.c_str());

    return 0;
}

Binary protobuf of this sample (for completeness):

00000000  0a 0b 64 73 66 61 64 73  61 66 73 61 66 10 ea 01  |..dsfadsafsaf...|
00000010  18 f6 f9 8e 01 18 f7 f9  8e 01 18 f8 f9 8e 01 20  |............... |
00000020  8a 86 f1 fe ff ff ff ff  ff 01 20 89 86 f1 fe ff  |.......... .....|
00000030  ff ff ff ff 01 20 88 86  f1 fe ff ff ff ff ff 01  |..... ..........|
00000040  2a 1b 04 07 f4 ff ff ff  ff ff ff ff ff 01 04 07  |*...............|
00000050  fd ff ff ff ff ff ff ff  ff 01 04 07 00 32 10 0a  |.............2..|
00000060  05 73 64 66 66 66 12 07  71 22 71 71 5c 71 0a 32  |.sdfff..q"qq\q.2|
00000070  23 0a 14 20 20 20 73 64  66 66 66 32 20 20 d1 82  |#..   sdfff2  ..|
00000080  d0 b5 d1 81 d1 82 20 12  0b 71 09 71 71 3c 3e 71  |...... ..q.qq<>q|
00000090  32 26 01 ff 3a 06 00 01  02 ff fe fd a0 06 01 a9  |2&..:...........|
000000a0  06 ea 19 0c bf e5 15 34  40 b0 06 01              |.......4@...|
000000ac

Note that it's the sample is not completely OK: libprotobuf ERROR google/protobuf/wire_format.cc:1059] Encountered string containing invalid UTF-8 data while parsing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes.

Note that protoc tool also can decode messages to text, both with the proto file and without:

$ protoc --decode=Test test.proto < test.bin 
[libprotobuf ERROR google/protobuf/wire_format.cc:1091] String field 'value' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes. 
f1: "dsfadsafsaf"
f2: 234
fa: 2342134
fa: 2342135
fa: 2342136
fb: -2342134
fb: -2342135
fb: -2342136
fc: 4
fc: 7
fc: -12
fc: 4
fc: 7
fc: -3
fc: 4
fc: 7
fc: 0
pairs {
  key: "sdfff"
  value: "q\"qq\\q\n"
}
pairs {
  key: "   sdfff2  \321\202\320\265\321\201\321\202 "
  value: "q\tqq<>q2&\001\377"
}
bbbb: "\000\001\002\377\376\375"
[gtt]: true
[gtg]: 20.0855369
[someEnum]: Variant1
$ protoc --decode_raw  < test.bin 
1: "dsfadsafsaf"
2: 234
3: 2342134
3: 2342135
3: 2342136
4: 18446744073707209482
4: 18446744073707209481
4: 18446744073707209480
5: "\004\007\364\377\377\377\377\377\377\377\377\001\004\007\375\377\377\377\377\377\377\377\377\001\004\007\000"
6 {
  1: "sdfff"
  2: "q\"qq\\q\n"
}
6 {
  1: "   sdfff2  \321\202\320\265\321\201\321\202 "
  2: "q\tqq<>q2&\001\377"
}
7: "\000\001\002\377\376\375"
100: 1
101: 0x403415e5bf0c19ea
102: 1

Solution 2 - Serialization

Simplified, output from protoc.exe version 3.0.0 on window7 + cygwin

Demo message

$ cat demo.proto
   syntax = "proto3";  package demo;  message demo { repeated int32 n=1; } 

Create a protobuf binary data

$ echo n : [1,2,3]  | protoc --encode=demo.demo demo.proto > demo.bin

Dumping proto data as text

 $ protoc --decode=demo.demo demo.proto < demo.bin
    n: 1
    n: 2
    n: 3

And dump even if you don't have the proto definiton

    $ protoc --decode_raw < demo.bin  
    1: "\001\002\003" 

            

Solution 3 - Serialization

An example from an open-source repo https://github.com/google/nvidia_libs_test/blob/master/cudnn_benchmarks.textproto

convolution_benchmark {
  label: "NHWC_128x20x20x56x160"
  input {
    dimension: [128, 56, 20, 20]
    data_type: DATA_HALF
    format: TENSOR_NHWC
  }
}

More examples across GitHub https://github.com/search?q=extension%3Atextproto https://github.com/search?q=extension%3Apbtxt

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionVi.View Question on Stackoverflow
Solution 1 - SerializationVi.View Answer on Stackoverflow
Solution 2 - SerializationmoshView Answer on Stackoverflow
Solution 3 - SerializationColonel PanicView Answer on Stackoverflow