A Conceptual Model for Storage Unification

Object storage is taking over more of the data stack, but low-latency systems still need separate hot-data storage. Storage unification is about presenting these heterogeneous storage systems and formats as one coherent resource. Not one storage system and storage format to rule them all, but virtualizing them into a single logical view. 

The primary use case for this unification is stitching real-time and historical data together under one abstraction. We see such unification in various data systems:

  • Tiered storage in event streaming systems such as Apache Kafka and Pulsar

  • HTAP databases such as SingleStore and TiDB

  • Real-time analytics databases such as Apache Pinot, Druid and Clickhouse

The next frontier in this unification are lakehouses, where real-time data is combined with historical lakehouse data. Over time we will see greater and greater lakehouse integration with lower latency data systems.

In this post, I create a high-level conceptual framework for understanding the different building blocks that data systems can use for storage unification, and what kinds of trade-offs are involved. I’ll cover seven key considerations when evaluating design approaches. I’m doing this because I want to talk in the future about how different real-world systems do storage unification and I want to use a common set of terms that I will define in this post.

Remediation: What happens after AI goes wrong?

If you’re following the world of AI right now, no doubt you saw Jason Lemkin’s post on social media reporting how Replit’s AI deleted his production database, despite it being told not to touch anything at all due to a code freeze. After deleting his database, the AI even advised him that a rollback would be impossible and the data was gone forever. Luckily, he went against that advice, performed the rollback, and got his data back.

Then, a few days later I stumbled on another case, this time of the Gemini CLI agent deleting Anurag Gupta’s files. He was just playing around, kicking the tires, but the series of events that took place is illuminating.

These incidents show AI agents making mistakes, but they also show agents failing to recover. In both cases, the AI not only broke something, but it couldn't fix it. That’s why remediation needs to be a first-class concern in AI agent implementations.

The Cost of Being Wrong

A recent LinkedIn post by Nick Lebesis caught my attention with this brutal take on the difference between good startup founders and coward startup founders. I recommend you read the entire thing to fully understand the context, but I’ve pasted the part that most resonated with me below:

"Real founders? They make the wrong decision at 9am. Fix it by noon. Ship by 5. Coward founders are still scheduling the kickoff meeting. Your job isn't to be liked. Your job is to be clear. Wrong but decisive beats right but timid... every single time. Committees don't build companies. Convictions do."

It's harsh, but there's truth here that extends well beyond startups into how we approach technical decision-making in software development, even in large organizations. 

Responsibility Boundaries in the Coordinated Progress model

Building on my previous work on the Coordinated Progress model, this post examines how reliable triggers not only initiate work but also establish responsibility boundaries. Where a reliable trigger exists, a new boundary is created where that trigger becomes responsible for ensuring the eventual execution of the sub-graph of work downstream of it. The boundaries can even layer and nest, especially in orchestrated systems that overlay finer-grained boundaries.