Most places I look recommend that to prevent data loss you should create a "retry" topic. If a consumer fails it should send a message to the "retry" topic which would wait a set period of time and then send a message back to the "main" topic.
Isn't this an anti-pattern since when it goes back to the "main" topic all the services subscribed to the "main" topic would reprocess the failed message even though only one of the services failed to process it initially?
Is there a conventional way of solving this such as putting the clientId in the headers for messages that are the result of a retry? Am I missing something?
Best Answer
Dead-letter queues (DLQ), in themselves, are not an anti-pattern. Cycling it back through the main topic might be, but that is subjective.
The alternative would be to "stop the world" and update the consumer code to resolve the errors before the topic retention deletes the messages you care about. OR, make your downstream consumers also read from the DLQ topic(s), but handle them differently from the main topic.
Is there a conventional way of solving this such as putting the clientId in the headers
Maybe if you wanted to track lineage somehow, but re-introducing those previously bad messages would introduce lag and ordering issues that interfere with the "good" messages.
Related question
What is the best practice to retry messages from Dead letter Queue for Kafka