Jack Vanlightly

Kafka vs Redpanda Performance - Do the claims add up?

Apache Kafka has been the most popular open source event streaming system for many years and it continues to grow in popularity. Within the wider ecosystem there are other open source and source available competitors to Kafka such as Apache Pulsar, NATS Streaming, Redis Streams, RabbitMQ and more recently Redpanda (among others).

Redpanda is a source available Kafka clone written in C++ using the Seastar framework from ScyllaDB, a wide-column database. It uses the popular Raft consensus protocol for replication and all distributed consensus logic. Redpanda has been going to great lengths to explain that its performance is superior to Apache Kafka due to its thread-per-core architecture, use of C++, and its storage design that can push high performance NVMe drives to their limits.

They list a bold set of claims and those claims seem plausible. Built in C++ for modern hardware with a thread-per-core architecture sounds compelling and it seems logical that the claims must be true. But are they?

Is sequential IO dead in the era of the NVMe drive?

Is sequential IO dead in the era of the NVMe drive?

Two systems I know pretty well, Apache BookKeeper and Apache Kafka, were designed in the era of the spinning disk, the hard-drive or HDD. Hard-drives are good at sequential IO but not so good at random IO because of the relatively high seek times. No wonder then that both Kafka and BookKeeper were designed with sequential IO in mind.

Both Kafka and BookKeeper are distributed log systems and so you’d think that sequential IO would be the default for an append-only log storage system. But sequential and random IO sit on a continuum, with pure sequential on one side and pure random IO on the other. If you have 5000 files which you are appending to in small writes in a round-robin manner, and performing fsyncs, then this is not such a sequential IO access pattern, it sits further to the random IO side. So just by being an append-only log doesn’t mean you get sequential IO out of the gate.