June 11, 2025

Coordinated Progress – Part 4 – A Loose Decision Framework

June 11, 2025

Microservices, functions, stream processors and AI agents represent nodes in our graph. An incoming edge represents a trigger of work in the node, and the node must do the work reliably. I have been using the term reliable progress but I might have used durable execution if it hadn’t already been used to define a specific type of tool.

Jack Vanlightly

June 11, 2025

Distributed Systems

Coordinated Progress – Part 3 – Coupling, Synchrony and Complexity

Jack Vanlightly

June 11, 2025

Distributed Systems

In part 2, we built a mental framework using a graph of nodes and edges to represent distributed work. Workflows are subgraphs coordinated via choreography or orchestration. Reliability, in this model, means reliable progress: the result of reliable triggers and progressable work.

In part 3 we refine this graph model in terms of different types of coupling between nodes, and how edges can be synchronous or asynchronous. Let’s set the scene with an example, then dissect that example with the concepts of coupling and communication styles.

Jack Vanlightly

June 11, 2025

Distributed Systems

Coordinated Progress – Part 2 – Making Progress Reliable

Jack Vanlightly

June 11, 2025

Distributed Systems

In part 1, we described distributed computation as a graph and constrained the graph for this analysis to microservices, functions, stream processing jobs and AI Agents as nodes, and RPC, queues, and topics as the edges.

Within our definition of The Graph, a node might be a function (FaaS or microservice), a stream processing job, an AI Agent, or some kind of third-party service. An edge might be an RPC channel, a queue or a topic.

For a workflow to be reliable, it must be able to make progress despite failures and other adverse conditions. Progress typically depends on durability at the node and edge levels.

Jack Vanlightly

June 11, 2025

Distributed Systems

Coordinated Progress – Part 1 – Seeing the System: The Graph

Jack Vanlightly

June 11, 2025

Distributed Systems

At some point, we’ve all sat in an architecture meeting where someone asks, “Should this be an event? An RPC? A queue?”, or “How do we tie this process together across our microservices? Should it be event-driven? Maybe a workflow orchestration?” Cue a flurry of opinions, whiteboard arrows, and vague references to sagas.

Now that I work for a streaming data infra vendor, I get asked: “How do event-driven architecture, stream processing, orchestration, and the new durable execution category relate to one another?”

These are deceptively broad questions, touching everything from architectural principles to practical trade-offs. To be honest, I had an instinctual understanding of how they fit together but I’d never written it down, so this series is how I see it, my mental framework, and hopefully it will be useful and understandable to you.