Early in my career, I thought queues were just a way to “do stuff later.” Background jobs. Email sending. That kind of thing. They felt like an optimization — nice to have, not essential.
I was wrong. These days, I think queues are the single most underappreciated piece of infrastructure in backend systems. Not because they’re complex (they’re not), but because of how many problems they quietly solve.
The Problem with Synchronous Calls
Let’s start with a common pattern. Service A needs to tell Service B that something happened.
# Service A
def process_order(order):
db.save_order(order)
response = requests.post("http://service-b/notify", json=order.to_dict())
if response.status_code != 200:
raise Exception("Service B failed")
return order
This works fine when both services are healthy. But now consider what happens when they’re not:
- Service B is down. Your order processing fails even though it has nothing to do with notifications.
- Service B is slow. Your request hangs, your thread pool fills up, and Service A starts rejecting all requests.
- Traffic spikes. Both services get hammered. Service A can scale, but Service B can’t keep up. Requests start timing out.
The fundamental problem: synchronous calls create tight coupling. Service A’s availability depends on Service B’s availability, and there’s no buffer for traffic differences.
Add a Queue Between Them
# Service A — just publish and move on
def process_order(order):
db.save_order(order)
queue.publish("order.processed", order.to_dict())
return order
# Service B — consume at its own pace
def handle_order_event(event):
send_notification(event)
That’s it. One small change, and you get:
- Decoupling. Service A doesn’t know or care about Service B. It publishes an event and moves on.
- Resilience. If Service B goes down, messages pile up in the queue. When it comes back, it processes the backlog. No data loss.
- Backpressure. Service B consumes at whatever rate it can handle. If traffic spikes, the queue absorbs the burst.
Backpressure Is the Key
Backpressure is the concept I wish someone had explained to me earlier. It’s simple: when a downstream system can’t keep up, the pressure needs to go somewhere. Without a queue, it goes to the caller — in the form of timeouts, errors, and cascading failures.
With a queue, the pressure goes to the queue itself. Messages stack up, but nobody crashes. The consumer chews through the backlog at a sustainable rate.
Traffic spike:
┌──────────┐
Requests → │ Queue │ → Consumer (steady rate)
100/s │ (buffer) │ 10/s
│ depth: │
│ growing │
└──────────┘
After spike:
┌──────────┐
Requests → │ Queue │ → Consumer (catches up)
5/s │ (buffer) │ 10/s
│ depth: │
│ draining │
└──────────┘
I’ve seen this pattern save production systems multiple times. A marketing email goes out, traffic spikes 10x, and the system survives because there’s a queue between the API and the heavy processing. Without the queue, the whole thing would fall over.
Retry Semantics Come Free
One of the best things about queues: retries are built into the model.
With synchronous calls, you have to build retry logic yourself — exponential backoff, jitter, max attempts, circuit breakers. It’s a lot of code, and it’s easy to get wrong.
With a queue (especially Kafka), if a consumer fails to process a message, it simply doesn’t commit the offset. The message stays in the queue and gets reprocessed. You don’t need to build retry infrastructure — it’s inherent to how consumers work.
# Kafka consumer — automatic retry via offset management
def consume():
for message in consumer:
try:
process(message)
consumer.commit() # success, move on
except TransientError:
# don't commit — message will be redelivered
log.warning(f"Will retry: {message.key}")
except PermanentError:
# send to dead-letter queue, commit, move on
dead_letter.publish(message)
consumer.commit()
Of course, this means your consumers need to be idempotent. Processing the same message twice should have the same result as processing it once. That’s the trade-off — but it’s a much easier problem to solve than building a full retry framework.
When Queues Aren’t the Answer
Queues aren’t always right. Here’s when I don’t use them:
- When you need an immediate response. If the user is waiting for the result of an operation, a queue adds latency that a direct call doesn’t. You can’t tell a user “your payment is being processed, results in 30 seconds” for an e-commerce checkout.
- When the system is simple. Two services, low traffic, both maintained by the same team? A direct HTTP call is fine. Don’t add Kafka for a system that handles 50 requests per minute.
- When ordering is critical and complex. Queues can preserve ordering within a partition, but if you need global ordering across multiple data types, you might be adding more complexity than you’re solving.
The Mental Model
I think of queues as shock absorbers. In a car, the road is uneven — bumps, potholes, sudden drops. Without shock absorbers, every bump transfers directly to the passengers. With them, the ride is smooth even when the road isn’t.
Your production traffic is the road. It’s uneven — spikes, bursts, sudden surges from a marketing campaign or a bot crawling your API. Without a queue, every spike transfers directly to your services. With a queue, the spike is absorbed, and your consumers keep processing at a steady, healthy rate.
The best part? This is boring infrastructure. Kafka, RabbitMQ, SQS — all battle-tested, well-documented, and understood by thousands of engineers. You’re not doing anything clever. You’re just putting a buffer in the right place.
And sometimes, that’s all it takes to turn a fragile system into a reliable one.