Python Protobuf
- Description: Protobuf in Python, generating
_pb2.py, the message API, parsing/serializing binary and text formats,oneof/repeated/map fields,Any,IsInitialized, and JSON viaMessageToJson - My Notion Note ID: K2A-D2-2
- Created: 2023-06-28
- Updated: 2026-05-11
- License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io
Table of Contents
- 1. Why Protobuf
- 2. Generating Python Code
- 3. The Generated Message API
- 4. Repeated, Map, and
oneofFields - 5. Serialization and Parsing
- 6.
Any: Pack, Unpack, Is - 7.
IsInitializedand Required Fields - 8. Two Python Implementations: Pure vs UPB
- 9. Cross-Language Schema Sharing
- 10. References
1. Why Protobuf
- Google's binary serialization format
- A
.protofile declares the data;protoccompiles it into classes for many languages - Wire format: binary, compact, schema-evolving, unknown-field-tolerant
- Python output: a
*_pb2.pymodule - Python API is more dynamic than C++, fields as Python attributes, repeated fields look like lists, easy JSON conversion
2. Generating Python Code
Install runtime + compiler:
pip install protobuf
# protoc itself (one option):
brew install protobuf # or download from https://github.com/protocolbuffers/protobuf/releases
Compile a .proto:
protoc -I=src --python_out=build src/addressbook.proto
- Produces
build/addressbook_pb2.py - For gRPC stubs: also pass
--grpc_python_out=build(requiresgrpcio-tools)
import addressbook_pb2
person = addressbook_pb2.Person()
3. The Generated Message API
Given:
syntax = "proto3";
message Person {
string name = 1;
int32 id = 2;
string email = 3;
repeated string phones = 4;
}
The generated class behaves like a regular Python object:
p = addressbook_pb2.Person()
p.name = "Yu"
p.id = 42
p.email = "[email protected]"
p.phones.append("555-1234")
p.phones.extend(["555-5678", "555-9012"])
p.name # "Yu"
p.id # 42
# Scalar fields default to the proto3 zero value (0, "", False).
# HasField only works on message-typed and `optional` scalar fields.
sub = addressbook_pb2.Person()
sub.HasField("email") # only if `email` is `optional`
p.Clear() # reset all fields
p.ClearField("name") # reset one field
Constructor accepts field kwargs:
p = addressbook_pb2.Person(name="Yu", id=42, phones=["555-1234"])
- Copy a message:
dst.CopyFrom(src)(replaces) ordst.MergeFrom(src)(merges)
4. Repeated, Map, and oneof Fields
message Order {
repeated string items = 1;
map<string, int32> counts = 2;
oneof payment {
string card = 3;
string bank = 4;
}
}
o = order_pb2.Order()
# repeated: list-like, no assignment of a Python list
o.items.append("apple")
o.items.extend(["banana", "cherry"])
o.items[:] = ["orange"] # full replacement
# map: dict-like
o.counts["apple"] = 3
o.counts.update({"banana": 5})
# oneof: setting one field clears the others
o.card = "1234"
o.WhichOneof("payment") # 'card'
o.bank = "ACME" # now WhichOneof is 'bank'; `card` is cleared
For repeated messages, append via .add():
phone = p.phones.add()
phone.number = "555-1234"
5. Serialization and Parsing
5.1 Binary (Wire Format)
data = p.SerializeToString() # bytes
p2 = addressbook_pb2.Person()
p2.ParseFromString(data) # raises DecodeError on bad input
# or
p2 = addressbook_pb2.Person.FromString(data)
# Read/write files:
with open("person.bin", "wb") as f:
f.write(p.SerializeToString())
with open("person.bin", "rb") as f:
p2.ParseFromString(f.read())
SerializePartialToStringskips the required-field check (proto2 only)
5.2 Text Format
- Human-readable; useful for logs, debugging, and golden test fixtures
from google.protobuf import text_format
s = text_format.MessageToString(p) # str
p2 = addressbook_pb2.Person()
text_format.Parse(s, p2) # also: text_format.Merge
# Round-trip via file:
with open("person.txt", "w") as f:
f.write(text_format.MessageToString(p))
with open("person.txt", "r") as f:
text_format.Parse(f.read(), p2)
5.3 JSON
from google.protobuf import json_format
s = json_format.MessageToJson(p, indent=2, preserving_proto_field_name=True)
p2 = addressbook_pb2.Person()
json_format.Parse(s, p2) # ParseError on unknown fields (by default)
MessageToDict/ParseDict, go through Python dicts when bridging to libraries that expect dicts
6. Any: Pack, Unpack, Is
google.protobuf.Anycarries an arbitrary serialized message plus its type URL
from google.protobuf import any_pb2
a = any_pb2.Any()
a.Pack(p) # wrap a Person
if a.Is(addressbook_pb2.Person.DESCRIPTOR):
p2 = addressbook_pb2.Person()
a.Unpack(p2)
Packrecords the type URL (type.googleapis.com/<full_name>by default)Ischecks the URL against a descriptorUnpackdecodes the bytes into the given message; returnsTrueon success
7. IsInitialized and Required Fields
requiredfields only exist in proto2IsInitialized(), allrequiredfields (recursively) set
if not p.IsInitialized():
missing = p.FindInitializationErrors()
raise ValueError(f"missing: {missing}")
- In proto3, trivially
True, norequiredfields - Escape hatch: proto3
optional(since 3.15 / late 2020) restoresHasFieldfor scalars
8. Two Python Implementations: Pure vs UPB
- Pure Python: slow but easy to debug
- UPB / C extension (
upbsince 4.x, wascppbefore), 5–20× faster
Force a backend:
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python # or "upb" / "cpp"
- API-compatible across backends
- Error messages and edge-case behavior differ, test against the backend you ship
9. Cross-Language Schema Sharing
- The
.protofile is the contract; generated code is a per-language build artifact
Common layout:
proto/ # canonical schemas, checked in
addressbook.proto
generated/
cpp/addressbook.pb.{h,cc}
python/addressbook_pb2.py
go/addressbook.pb.go
- Build systems (Bazel, CMake + custom rules,
buf) regenerate fromproto/to keep languages in sync