What is Protobuf?

6 minute read

Published:

What is Protobuf and it’s advantages?

What is Protobuf ?

Protocol Buffers are a mechanism for serializing structured data.

  • Basically, Protocol Buffers (often referred to as Protobuf) is a** data format** developed by Google used for serializing data, similar to XML or JSON.

  • Protobuf is like JSON, except it’s smaller and faster.

  • It is particularly designed to be efficient in terms of both size and speed, making it suitable for communication protocols and data storage.

What is Data serialization ?

  • Data serialization is the process of converting complex data structures, such as objects or arrays, into a format that can be easily transmitted over a network or stored in a file.

  • Essentially it is the process of converting some in-memory object to a format that could be easily stored or sent over the network.

    Whats data format ?

A data format is a specific representation or arrangement of data. Data formats define how information is structured, transmitted and encoded.

Different type of data formats:

  • Text-based Formats: CSV, JSON, XML
  • Binary Formats: Avro, Protocol Buffers (protobuf)
  • Database-specific Formats: SQLite Database File (.db ), MySQL Dump (.sql)

So how is Protobuf different from JSON ??

JSON:

  • It is lightweight data interchange format that is easy for humans to read and write.

  • In JSON Human readability is important making it auitable for configuration files, APIs, and scenarios where human readability is important.

  • But its Slower and bulkier: It has slowe serialization and deserialization compared to binary formats. It is less compact, leading to larger data sizes.

Protobuf: Protobuf is a binary format

  • Optimized for efficient data interchange

  • It is not human-readable and optimized for serialization and efficient data interchange.

  • Used for High-performance use cases

What are binary formats ?

  • A binary format is a method of encoding data as a sequence of binary digits (0 and 1) for storage or transmission.
  • In contrast to text-based formats like JSON and XML, binary formats are not human-readable and must be parsed by a program or library in order to be understood.

How do Protocol Buffers Work?

  • First, You define how you want your data to be structured

  • Then you can use special generated source code to write and read your structured data to a variety of data streams

[https://protobuf.dev/overview/](https://protobuf.dev/overview/)https://protobuf.dev/overview/

Detailed Steps :

  1. Define Schema : Protocol Buffers (Protobuf) work by defining a structured schema for data using a interface description language.
  • Create a .proto file to define the structure of their data using the Protobuf schema definition language. Example:

    syntax = “proto3”; message Person { string name = 1; int32 id = 2; bool has_ponycopter = 3; }

  1. Compile Schema: The .proto file is compiled using a Protobuf compiler (protoc).
  • This compiler generates language-specific code (such as Java, Python, etc.) based on the defined schema. Example command for generating Python code:

    protoc - python_out=. your_proto_file.proto

  1. **Generate Code : **The generated code includes classes for each message type, as well as methods for serialization and deserialization. These methods are optimized for efficiency and are used to convert data between its in-memory representation and the serialized binary format.

    class Person: def init(self): self.name = “” self.id = 0 self.has_ponycopter = False

    def serialize_to_bytes(self): # Serialization logic

    def parse_from_bytes(self, data): # Deserialization logic

  2. Serialize Data: Now create instances of the generated classes, set values for the fields, and use the provided serialization methods to convert the data into a binary format.

    person = Person() person.name = “John Doe” person.id = 123 person.has_ponycopter = True serialized_data = person.serialize_to_bytes()

[https://medium.com/javarevisited/what-are-protocol-buffers-and-why-they-are-widely](https://medium.com/javarevisited/what-are-protocol-buffers-and-why-they-are-widely-used-cbcb04d378b6)https://medium.com/javarevisited/what-are-protocol-buffers-and-why-they-are-widely

  1. Deserialize data on receving side: The binary data can be transmitted over a network, stored in a file, or sent between systems.
  • On the receiving end, the binary data is deserialized using the generated code, reconstructing the original in-memory data structure.

    new_person = Person() new_person.parse_from_bytes(serialized_data) print(new_person.name) # Output: John Doe print(new_person.id) # Output: 123 print(new_person.has_ponycopter) # Output: True

Advantages of using protocol buffers :

  • Compact data storage

  • Fast parsing

  • Availability in many programming languages

  • Optimized functionality through automatically-generated classes

Key features of Protocol Buffers:

  1. Provides Schema Definition Language: Protobuf uses schema to define the structure of the data to be serialized. This schema is defined in a language-agnostic format and serves as a contract between systems.

  2. Uses Binary Format: Protobuf serializes data into a binary format, which is more compact compared to human-readable formats like JSON or XML. This results in smaller payload sizes, making it more efficient in terms of bandwidth and storage.

  3. Provides **Multiple Language Support **(Language-Agnostic Communication): Protobuf provides compilers for multiple languages (e.g., Java, C++, Python). The language-agnostic nature of Protobuf allows systems implemented in different programming languages. As long as they share a common schema, data can be serialized and deserialized

  4. Provides Backward and Forward Compatibility: Protobuf is designed to be backward and forward compatible. New fields can be added to messages without breaking compatibility with existing code. Unknown fields are ignored, and missing fields are treated as having default values.

  5. High Performance: The binary format and code generation contribute to faster serialization and deserialization compared to some other data interchange formats.

  6. Is Extensibile: Protobuf supports extending existing message types, allowing for the addition of new fields without modifying the original message definition.

When to use Protobuf?

Protocol Buffers (Protobuf) are suitable for scenarios where efficient serialization, fast data exchange, and compact data representation are important.

  • Faster Communication Between Services : Protobufs compact binary format reduces network bandwidth usage, and its efficient serialization and deserialization contribute to faster communication between services.

  • For Web APIs and RPC (Remote Procedure Call): Protobuf is a good choice for defining APIs and performing remote procedure calls. Its language-agnostic nature allows for interoperability between services implemented in different programming languages.

  • Logging Data: When logging data, especially in high-throughput systems, using Protobuf can result in smaller log files compared to text-based formats like JSON or XML. This can be beneficial for both storage and log analysis.

  • In Message Queue Systems: Protobuf can be used in message queues to serialize messages before publishing and deserialize them upon consumption. This can improve the overall performance of message passing systems.

  • Performance-Critical Applications: In applications where serialization and deserialization performance are critical, such as real-time systems or high-frequency trading platforms, Protobuf’s efficient binary format can provide a performance advantage over other serialization formats.

tldr:

Protobuf is a binary format.

  • Its optimized for efficient data interchange

  • It is not human-readable and optimized for serialization and efficient data interchange.

  • Its used for high-performance use cases