Thuan: Alex, I went to a tech conference last month. Every other talk mentioned “event-driven architecture,” “CQRS,” or “the Saga pattern.” People talked about them like they’re essential. But when I got home and looked at my codebase… it’s just a REST API calling a database. Am I behind?
Alex: No. You’re normal. Here’s a dirty secret of the tech industry: most applications don’t need these patterns. They’re powerful solutions to specific problems. But if you don’t have those problems, adding these patterns just makes your code harder to understand.
Thuan: So when are they useful? Help me separate the buzz from the practical.
Event-Driven Architecture: Don’t Call Me, I’ll Call You
Alex: Let’s start with event-driven architecture. In a traditional system, when something happens, one service calls another service. The order service calls the payment service. The payment service calls the notification service. The notification service calls the analytics service. It’s a chain.
Thuan: Like a phone tree. I call you, you call her, she calls him.
Alex: Exactly. And what happens when one person in the phone tree doesn’t pick up?
Thuan: The chain breaks. Everyone after that person gets nothing.
Alex: Right. And in software, the calling service has to wait for a response. If the payment service is slow, the order service is slow too. If the notification service is down, the entire chain might fail.
Event-driven architecture flips this model. Instead of calling services directly, you announce what happened. The order service doesn’t call the payment service. It publishes an event: “Hey everyone, order 456 was just placed.” The payment service, notification service, and analytics service all listen for that event and react independently.
Thuan: So it’s like posting on a bulletin board instead of making phone calls?
Alex: Perfect analogy. The order service doesn’t know or care who’s listening. It just posts on the bulletin board. If the notification service is down, the order service doesn’t know and doesn’t care. The notification service will read the bulletin board when it comes back up.
Thuan: That sounds great. What’s the downside?
Alex: Complexity and debugging. In a direct call system, you can trace the flow: order → payment → notification. In an event-driven system, the flow is implicit. Events fly around, and you need tools to track which service processed which event. When something goes wrong, figuring out “which service failed to handle which event” is much harder than following a stack trace.
Thuan: What tools help with that?
Alex: Distributed tracing — Jaeger or Zipkin. Event stores — keeping a log of all events and their processing status. Dead letter queues — when a service fails to process an event, the event goes to a special queue for manual investigation instead of being lost.
When to Use (and Not Use) Events
Thuan: Give me the practical guide. When should I switch to events?
Alex: Use events when:
Multiple services need to react to the same thing. Order placed → payment processes, inventory updates, email sends, analytics records. In a direct call model, the order service would need to know about all these services and call them sequentially. With events, the order service publishes once and every service handles it independently.
You need to decouple services. If Service A calling Service B creates a dependency — A can’t deploy without B being up — events break that dependency. They can evolve independently.
You need asynchronous processing. The user doesn’t need to wait for the email to send or the analytics to record. They need to know their order was placed. Event-driven lets you acknowledge the user immediately and process everything else in the background.
Don’t use events when:
You need a synchronous response. If the user needs to see their search results right now, don’t use events. A direct API call to the search service is correct.
You have a simple system. Three services, one team, simple workflows. Events add infrastructure (message broker), complexity (eventual consistency), and cognitive load (implicit flows). If direct calls work fine, keep them.
CQRS: Read and Write Are Different Problems
Thuan: OK, next buzzword. CQRS. I know it stands for Command Query Responsibility Segregation, but what does that actually mean?
Alex: It means splitting your system into two parts: one for writing data and one for reading data. Different code, potentially different databases, optimized for their specific job.
Thuan: Why would you do that? One database handles both reads and writes.
Alex: In simple applications, yes. One database, one set of models, both reading and writing use the same code. But here’s when that breaks down.
Imagine an e-commerce platform. Writing is relatively simple: add an item to cart, place an order, update inventory. These operations are straightforward — you’re modifying rows in a database.
Reading is complex. Show the user their personalized dashboard: recent orders with product images, estimated delivery times, recommended products based on purchase history, loyalty points, and promotional banners. Constructing this page requires joining six tables, running recommendation algorithms, and aggregating data from multiple services.
Thuan: So the read path and the write path have completely different requirements?
Alex: Exactly. The write path needs strong consistency — ACID transactions. The read path needs speed — denormalized data, pre-computed views, caching. Trying to optimize one database for both is like trying to make a car that’s both a great race car and a great off-road vehicle. Possible, but you’ll compromise on both.
Thuan: So with CQRS, you’d have one database optimized for writes and another optimized for reads?
Alex: One common pattern: a relational database for writes — PostgreSQL with normalized tables. And a search engine or document store for reads — Elasticsearch or MongoDB with denormalized views. When a write happens, an event is published. A service listens for that event and updates the read model.
Thuan: Isn’t that just caching with extra steps?
Alex: Ha! Sort of. But the read model isn’t just a cache — it’s a completely different data shape. The write model stores normalized, consistent data. The read model stores denormalized, query-optimized data that’s shaped exactly for what the UI needs. One write event might update multiple read models — the dashboard view, the order history view, the admin analytics view.
When CQRS Is Overkill
Thuan: This feels complex. When is it worth the effort?
Alex: CQRS is worth it when:
Read and write patterns are vastly different. Writes are simple CRUD, but reads are complex aggregations across multiple data sources. Netflix recommendation pages, social media feeds, analytics dashboards — these benefit from CQRS.
Read and write scale differently. Reads are 1,000 times more frequent than writes. You want to scale the read model independently — add more read replicas, more search nodes — without affecting the write path.
CQRS is overkill when:
Your reads and writes are similar. A blog with “create post” and “show post” — the write model and read model are basically the same. CQRS adds complexity for zero benefit.
You have one team. The coordination overhead of maintaining two models, an event pipeline, and eventual consistency is significant. For a small team, it’s a tax on productivity.
The Saga Pattern: Distributed Transactions Without the Pain
Thuan: Last pattern — Saga. We touched on this when we talked about microservices. Can you go deeper?
Alex: Sure. The Saga pattern solves the problem of transactions across multiple services.
In a monolith with one database, transactions are easy. “Book a flight, reserve a hotel, charge the credit card.” If the credit card fails, you cancel the flight and hotel. The database rollback handles everything.
In microservices, each step is a different service with a different database. There’s no single database transaction that spans all of them. So how do you handle “charge the credit card failed — undo the flight and hotel”?
Thuan: That’s the saga — a sequence of steps with compensation logic?
Alex: Exactly. A saga is a sequence of local transactions. Each step has a compensating action — an undo. If step 3 fails, you execute the compensating actions for steps 2 and 1.
Step 1: Flight service books a seat → Compensation: cancel the booking
Step 2: Hotel service reserves a room → Compensation: cancel the reservation
Step 3: Payment service charges the card → If this fails, trigger compensations
When step 3 fails: the payment service announces “payment failed.” The hotel service cancels the reservation. The flight service cancels the booking. Each service handles its own rollback.
Thuan: That sounds straightforward. What makes it hard?
Alex: Three things.
Timing. What if the compensation for step 1 fails? Now you have a partially compensated state. You need retry logic, timeout handling, and potentially manual intervention.
Visibility. When a saga is in progress, data is in an inconsistent state. The flight is booked, the hotel is reserved, but payment hasn’t been processed yet. What does the user see? “Your booking is being processed” — that’s intentionally vague because the system is mid-saga.
Complexity. A 3-step saga has 3 compensations. A 10-step saga has 10 compensations, and each can fail. The error handling grows exponentially. This is why experienced architects keep sagas short — 3 to 5 steps maximum.
Two Types of Saga
Thuan: I’ve heard about choreography and orchestration sagas. What’s the difference?
Alex: Choreography is like a dance where each dancer knows their own steps. No conductor. Each service listens for events and reacts. Flight service books → publishes “flight booked” → hotel service hears it and reserves → publishes “hotel reserved” → payment service hears it and charges.
Orchestration uses a conductor — a central saga coordinator. The coordinator tells each service what to do and tracks the progress. “Flight service, book the seat.” “Done.” “Hotel service, reserve the room.” “Done.” “Payment service, charge the card.” “Failed.” “Hotel service, cancel. Flight service, cancel.”
Thuan: Orchestration sounds easier to understand.
Alex: It is. And easier to debug because there’s a central log of what happened. Choreography is more decoupled — no single point of failure — but harder to understand because the flow is distributed across multiple services. For most teams, I recommend orchestration for sagas. The visibility and debugging advantages outweigh the minor coupling of a central coordinator.
Real-World Example: Combining All Three
Thuan: Can you show me a system that uses all three patterns together?
Alex: Sure. An e-commerce checkout system.
Event-driven: When a user places an order, the order service publishes an “OrderPlaced” event. The email service sends a confirmation. The analytics service records the sale. The inventory service reserves the items. All happening independently, triggered by one event.
CQRS: The product catalog uses CQRS. Writes — adding products, updating prices — go to a PostgreSQL database. Reads — searching, filtering, browsing — use an Elasticsearch index. An event pipeline syncs changes from PostgreSQL to Elasticsearch.
Saga: The checkout process is a saga. Reserve inventory → process payment → confirm order → schedule shipping. If payment fails, inventory is released. If shipping can’t be scheduled, the payment is refunded and inventory is released.
Thuan: And without these patterns?
Alex: Without them, a small e-commerce site works perfectly fine with a monolith, one database, and direct function calls. These patterns are solutions to scale and complexity problems. You adopt them when the simple approach genuinely breaks, not before.
Decision Framework
Thuan: Give me the cheat sheet. When do I reach for each pattern?
Alex:
Event-driven architecture: When multiple services need to react to the same event. When you need to decouple services that change independently. When async processing improves user experience.
CQRS: When read and write patterns are fundamentally different. When you need to optimize reads and writes independently. When read volume is 100x+ of write volume.
Saga pattern: When you need transactions across multiple services. When you need compensating logic for partial failures. Keep sagas under 5 steps.
None of the above: When your system is simple, your team is small, and direct calls work fine. The best architecture is the simplest one that meets your needs.
Key Takeaways You Can Explain to Anyone
Thuan: Takeaway time.
Alex:
-
Event-driven = bulletin board, not phone calls. Services announce what happened instead of calling each other. Loose coupling, independent processing, but harder to debug.
-
CQRS = separate models for reading and writing. Optimize each independently. Powerful for complex reads, overkill for simple CRUD.
-
Saga = distributed transaction with undo buttons. Each step has a compensating action. Keep sagas short. Orchestration is usually better than choreography.
-
These patterns solve specific problems. They’re not universally better than the simple approach. Use them when you hit the specific pain point they address.
-
Start simple. Add patterns when the simplicity breaks. A well-structured monolith with a single database is better than a poorly implemented event-driven microservices architecture.
Thuan: I feel so much better. Half the conference talks made me feel like I was doing everything wrong. Turns out I was doing things simply — which is usually right.
Alex: Simplicity is the ultimate sophistication. Leonardo da Vinci said that, and he definitely would have been a great architect.
Thuan: Last topic next time — the career one. From senior developer to tech lead. What actually changes?
Alex: That’s the most important conversation of the whole series.
This is Part 11 of the Tech Coffee Break series — casual conversations about real tech concepts, designed for listening and learning.
Next up: Part 12 — From Senior Dev to Tech Lead — What Changes?