May 7, 2024

Learning and reviewing system internals: tactics and psychology

May 7, 2024

Every now and then I get asked for advice on how to learn about distributed system internals and protocols. Over the course of my career I've picked up a learning and reviewing style that works pretty well for me.

To define these terms, learning and reviewing are similar but not the same:

Learning about how a system works is the easier of the two. By the means available to you (books, papers, blogs, code), you study the system to understand how it works and why it works that way.
Reviewing a system requires learning but also involves opinions, taking positions, making judgments. It is trickier to get right, more subjective, and often only time can show you if you were right or wrong about it and to what degree.

We all review systems to one degree or another, even if it's just a casual review where the results are some loosely held opinions shared by the coffee machine. But when it comes to sharing our opinions in more formal contexts, an architecture meeting, a blog post, a conference talk or a job interview, the stakes are higher and the risks are also greater. If you review a system and come to some conclusions, how do you know if you are right? What happens if you are wrong? Someone could point out your flawed arguments. You make a bad decision. Not only can reviewing complex systems be hard, it can be scary too.

Jack Vanlightly

May 2, 2024

Strategy and commentary

Hybrid Transactional/Analytical Storage

Jack Vanlightly

May 2, 2024

Strategy and commentary

Confluent has made two key feature announcements in the spring of 2024:

Freight Clusters, a new cluster type that writes directly to object storage. It is aimed at the “freight” of data streaming workloads, log ingestion, clickstreams, large-scale ETL and so on that can be cost-prohibitive using a low latency multi-AZ replication architecture in the cloud.
Tableflow, an automated feature that provides seamless materialization of Kafka topics as Apache Iceberg tables (and vice-versa in the future).

This trend towards object storage is not just happening at Confluent but across the data ecosystem.