Flatbuffers performance in Java

TLDR; FlatBuffers is not the cure for performance issues in Java

“Always right” attitude

Developers like to think of themselves as rational, methodical and driven by facts instead of beliefs. That gave rise to the popular myth of software field driven by meritocracy, instead of popularity/fashion/tribalism. This way we can feel better about ourselves and look down on fashion industry. Instead, we’ll jump on the next most popular (err. best) tool which came out from our popular designers (err. smartest developers) solving our world problems (err. first world problems).

Using the most popular tool

Everything new is better

FlatBuffers is nice serialization protocol from Google. While idea is nothing new and people have been doing such stuff for a long time, it tries to go a step further and be cross platform compatible. In doing so several optimizations which work in some unmanaged languages, stop working in most managed ones.

This gave rise to the myth of how to solve serialization problems in Java: “Just replace your JSON library with FlatBuffers”. Years before it was: “Just replace your JSON library with Protocol Buffers”.
Even Facebook solves their performance problems on Android the same way.

When someone actually digs into code and tries to understand/explain how such protocol works it sounds like a mad prophet.

Furious at the world

Benchmarking FlatBuffers

Google was kind enough to release public benchmarks along its library. Conveniently they only benchmarked the C++ implementation. What was not nice is the impression that other implementations behave the same. At least they did provide benchmarks, which is great improvement over the previous attitude that benchmarks are misleading and won’t tell you anything. That attitude crossed over with to CapnProto library, which shares the same underlying problems in managed languages as does FlatBuffers.

Most popular benchmark for JVM includes various other libraries. While it’s not perfect, it’s the go-to benchmark for Java. Unfortunately updates to the repository stalled, but it still provides excellent starting point for comparing various libraries. Latest run includes FlatBuffers and CapnProto.

If you are expecting those two near the top, you will be surprised. Since CapnProto Java implementation is not official it gets a pass. FlatBuffers on the other hand is maintained by Google. Since they advertise FlatBuffers as cure for the cancer, you would expect it would do better when put to the test.

Benchmarking FlatBuffers

There is a lot of room for improvements for Java FlatBuffers implementation. If Google wanted, they could improve the implementation (or hire someone outside), so they are at least not behind JSON. But when you are popular/big you can push your invalid point such as: “When serializing data from statically typed languages, however, JSON not only has the obvious drawback of runtime inefficiency, but also forces you to write more code to access data (counterintuitively) due to its dynamic-typing serialization system” and people will accept it without critical thinking.

Understanding FlatBuffers

To better understand pros and cons of FlatBuffers, first let’s analyze some of the popular arguments:

While JSON has a “schema” embedded with the data, most of the time, JSON is just a transport layer for some other schema. In most trivial cases this is done by serializing objects as strings in JSON, while they are actually some other type, eg. UUID, LocalDate,…
While “runtime inefficiencies” for converting objects into “string” exists, unless you your payload consists mostly of primitives and collections of primitives, those “inefficiencies” are minuscule.

Runtime databinding is greatly preferred over rigid generated code. If application has its own POJOs and just want to use some library for serialization, writing code for conversion from POJO into generated POJO-like monster or into series of offsets in byte[] (as in case of FlatBuffers) leads to much more code for data access. Only thing better than runtime databinding is compile time databinding 😀

If we are sending whole object over the wire, it’s not really useful to read only parts of it. It’s much better to only send interesting parts anyway. While this might lead to more code, it’s only because people are used to reusing existing POJOs everywhere.

FlatBuffers doesn’t allow nesting, which causes you to serialize everything as a flat sequence. This prevents it from being used in a streaming environment and solving actual problems developers have (such as streaming lists of objects).

While schemas are nice and most apps would benefit from writing less POO and more models, schema for the sake of serialization only is missing the bigger picture.
But that’s another story.

Leave a Reply

Your email address will not be published. Required fields are marked *