Types of Publishing Failures - RabbitMq Publishing Part 1

In this first part of the series we'll just go over the different failure scenarios on publishing messages and how they can be detected. In the following parts we'll look at example code for tracking message delivery status when performing bulk send operations and single message send operations. We'll also take a look at performance and message duplication.

There are many scenarios where things can go wrong when publishing messages to RabbitMq - the connection can fail, the exchange might not exist, no queue may be bound, a queue might be full, an Erlang process could crash etc. This post goes through the various scenarios and how you can detect them or not.

All examples use the C# RabbitMq Client using Publisher Confirms rather than Transactions with the Mandatory flag set. A publisher confirm, is an ack that is sent when all bound queues (durable) have persisted the message to disk. So as far as the publisher is concerned, its role is complete, RabbitMq has the message persisted to disk.

The mandatory flag tells RabbitMq that the message must be routable., that is, there must be one or more bound queues that will receive the message.

When using BasicPublish, there are four event handlers that we will need to use to track the state of each message:

BasicAcks - Fires when an ack is received. This can be for a single message or a batch of messages.
BasicNacks - Fires when a nack is received. This can be for a single message or a batch of messages.
BasicReturn - Fires when a message could not be routed to a queue. It is fired once for each undeliverable message.
ModelShutdown - Fires when the channel dies.

Let's look at the different scenarios.

Connection Failure

Either you'll get a OperationInterruptedException (or a sub type) thrown on calling BasicPublish or you won't receive an ack, it depends when the failure occurred. Ideally you want the failure to happen before you send the message as you can be pretty sure that RabbitMq didn't receive the message, so you can reestablish the connection and channel and do a retry without creating a duplicate message.

However, the failure could occur after sending the message but before receiving the ack. In this case you simply cannot know if RabbitMq received the message or not. You can resend the message but you might be creating a duplicate message. I always recommend adding a custom header to indicate that the message was republished. That allows the receiver to know that this could be a duplicate and they can perform extra logic to make the operation idempotent (if they don't already do that by default).

Mitigation

If you are sending many messages, then calling WaitForConfirms periodically will limit the number of affected messages when a connection failure occurs. BasicPublish is asynchronous and will publish as fast as the connection will allow which means you can have hundreds of messages pending an ack. But by calling WaitForConfirms after publishing messages you block until the confirm(s) comes in (or the timeout occurs). Calling WaitForConfirms after each message would limit your exposure to connection failure to a single message instead of hundreds but it is dramatically slower. So instead you can call WaitForConfirms every X number of messages to get a balance of both. It depends on your specific situation.

When an ack never comes in during the time period set in WaitForConfirms, then no event handler will have been fired for that message so you'll need a different mechanism for detecting the lack of an ack. We'll cover that in the next part.

Erlang Process Crash

A nack will be sent, so the BasicNacks event handler will fire. I have never been able to create this situation.

No Exchange

An AlreadyClosedException will occur. The ShutdownReason.ReplyCode will be 404.

Unroutable

This happens when either there are no queues bound to the exchange or that no queues have a matching binding key. Either way, your message cannot be delivered anywhere.

When a message is not routable and we have the Mandatory flag set then the BasicReturns event handler will be fired followed by the BasicAck.

Queue Full

When a queue is full (byte or message count limit is reached) then you might assume that publishing a message might fail. But this is not the case by default. It depends on what overflow behaviour is configured for your queue and on whether the queue has a deadletter exchange or not.

overflow = drop-head (default)

When the queue has no deadletter exchange then a message is removed from the head of the queue. That is, the oldest message, the end where consumers get messages from. So when you publish a message to a full queue, you can destroy messages that have not been consumed that were sent in the past (possibly by a different publisher).

If the queue has a deadletter exchange then these messages at the head of queue do not get destroyed but get sent to the deadletter exchange.

Either way, the publisher of the new message gets an ack as their message has been successfully added to the queue.

overflow = reject-publish

With this setting, a message that arrives at a full queue gets discarded. If publisher confirms are used then the publisher will receive a basic.nack response. Note that if the queue has a deadletter exchange, the message is not forwarded there, it is always discarded.

In the next part

Next in the series is a bulk message publisher with message state tracker. It will perform retries and track the state of each message for you. This can be a little tricky as BasicPublish is asynchronous and if you don't wait for the confirm for each message then you need a way of correlating each call to BasicPublish with an event handler firing.

We'll also demonstrate the dangers of republishing messages and look at some performance considerations.