About Protocol Buffer For Data Serialization
Before jumping on to the Protobuf we will first understand about data serialization.
What is data Data Serialization ?
Data serialization is the process of translating data structure or object state into a format that can be stored (for example, in a file or memory data buffer) or transmitted (for example, across a computer network) and reconstructed later (possibly in a different computer environment). When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. In some cases, the secondary intention of data serialization is to minimize the data’s size which then reduces disk space or bandwidth requirements.
We can simply store our data as normal text file or if we want to store a database table, we can also simply create a text file where we separate columns by rows and spaces by newlines. This is a kind of serialisation to as it is one of the ways of storing data on the disk. Now, if we want to read that data, you will have to write deserialisation (opposite of serialisation) code. As such, we can write a shared library to read and write data in this format in any language of our choice, say Go. But now, what if someone wants to access this in languages other than Go? Again, they have to go through the amount of work of writing the same shared library in a different language. Also, what if we had many different types of databases (databases with different schemas). It will definitely require a lot of repeated work, for sure, which is not error-proof to maintain. That is why we need serialisation technique so we can avoid lot of repeated work. Some of popular data serialisation formats includes XML, JSON, Protobuf etc. These formats have libraries in almost all popular languages to parse and generate these.
What is Protocol Buffers ?
Protocol buffers are Google’s extensible mechanism for serialising structured data which is language-neutral, platform-neutral, extensible mechanism. It is just like JSON or XML, but smaller, faster, and simpler due to the binary conversion. It lets the developers define how they want the data to be structured once, then they can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages
The neutral language used by Protobuf allows you to model messages in a structured format through .proto files:
message Person {
required string name = 1;
required int32 age = 2;
optional string email = 3;
}
In the example above we have use a structure that represents a person’s details , where it has mandatory attributes, such as name and age, as well as having the optional email data. Mandatory fields, as the name already says, must be filled when a new message is constructed, otherwise, a runtime error will occur.
Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Protobuf is used to encode messages from source service before producing to Apache Kafka and decode before consuming in destination service.
Why Protocol Buffer ?
Protocol buffers allow us to declare the minimal set of information to describe the format of records in a log and compiler do the the work of generating all the serialization and deserialization code in many different languages and it provides backwards compatibility with old versions, lots of free optimizations, validations and extensibility and also in terms of speed of encoding and decoding, size of the data on the wire.
Conclusion
Protocol Buffers provides several advantages over json for sending data over the wire between internal services. While not a replacement for json, especially for services which are directly consumed by a web browser, Protocol Buffers also offers typically in terms of speed of encoding and decoding, size of the data on the wire and more.