The Core Problem: Services Need to Talk to Each Other
In a monolith, function calls are in-process, near-instant, and never fail due to network issues. In a microservices architecture, every cross-service interaction is a network call — subject to latency, packet loss, service downtime, and partial failures. Choosing the right communication pattern for each interaction type is one of the most consequential microservices architecture decisions. The wrong choice creates tight coupling, cascading failures, and hard-to-debug distributed transactions.
Synchronous Communication
REST (HTTP)
REST over HTTP is the default choice for synchronous service-to-service calls. It is universally understood, has excellent tooling, and works well for request-response interactions where the caller needs an immediate result.
// Node.js / TypeScript: typed HTTP client with retry logic
import axios, { AxiosInstance } from 'axios'
import axiosRetry from 'axios-retry'
function createServiceClient(baseURL: string): AxiosInstance {
const client = axios.create({
baseURL,
timeout: 5000, // 5s timeout — never wait indefinitely for a downstream service
headers: { 'Content-Type': 'application/json' },
})
axiosRetry(client, {
retries: 3,
retryDelay: axiosRetry.exponentialDelay, // 100ms, 200ms, 400ms
retryCondition: (error) => {
// Retry on network errors and 5xx server errors, not 4xx client errors
return axiosRetry.isNetworkOrIdempotentRequestError(error) ||
(error.response?.status ?? 0) >= 500
},
})
return client
}
const orderService = createServiceClient('http://order-service:8080/api')
const { data: order } = await orderService.get<Order>(`/orders/${orderId}`)
gRPC: High-Performance Binary Protocol
gRPC uses Protocol Buffers (binary serialization) and HTTP/2, making it significantly faster than JSON over HTTP for high-throughput internal service communication. Define services in .proto files and generate typed clients in any language:
// user.proto
syntax = "proto3";
service UserService {
rpc GetUser (GetUserRequest) returns (User);
rpc ListUsers (ListUsersRequest) returns (stream User); // Server-side streaming
rpc UpdateUser (UpdateUserRequest) returns (User);
}
message GetUserRequest { string id = 1; }
message User {
string id = 1;
string name = 2;
string email = 3;
string role = 4;
}
// Generated TypeScript client usage:
const client = new UserServiceClient('user-service:50051', credentials.createInsecure())
const user = await promisify(client.getUser.bind(client))({ id: userId })
Use gRPC for: inter-service calls where latency and throughput matter (100K+ RPS), streaming large datasets between services, polyglot environments where code generation from .proto is valuable. Use REST when: public-facing APIs, teams unfamiliar with gRPC, or simple request-response with moderate throughput.
Asynchronous Communication: Message Queues
Asynchronous messaging decouples services — the sender does not wait for the receiver to process the message. This enables higher throughput, resilience to downstream failures, and loose coupling.
Kafka: Distributed Event Streaming
Kafka is a distributed log optimized for high-throughput, durable event streams. Messages are retained for a configurable period (default 7 days) and can be replayed. Multiple consumer groups can independently read the same topic.
// Producer: publishing events
import { Kafka, CompressionTypes } from 'kafkajs'
const kafka = new Kafka({ clientId: 'order-service', brokers: ['kafka:9092'] })
const producer = kafka.producer({ idempotent: true }) // Prevents duplicate messages
await producer.send({
topic: 'order.events',
compression: CompressionTypes.SNAPPY,
messages: [{
key: order.id, // Partitioned by order ID — same order → same partition → ordered
value: JSON.stringify({
type: 'ORDER_PLACED',
orderId: order.id,
userId: order.userId,
total: order.total,
timestamp: new Date().toISOString(),
}),
}],
})
// Consumer: processing events
const consumer = kafka.consumer({ groupId: 'notification-service' })
await consumer.subscribe({ topic: 'order.events', fromBeginning: false })
await consumer.run({
eachMessage: async ({ message }) => {
const event = JSON.parse(message.value!.toString())
if (event.type === 'ORDER_PLACED') {
await sendOrderConfirmationEmail(event.userId, event.orderId)
}
},
})
RabbitMQ: Task Queue Pattern
RabbitMQ is a traditional message broker with sophisticated routing via exchanges. Best for task queues where messages are consumed by exactly one worker (competing consumers pattern), and where complex routing rules are needed.
import amqp from 'amqplib'
// Publisher
const conn = await amqp.connect('amqp://rabbitmq:5672')
const channel = await conn.createChannel()
await channel.assertQueue('email-jobs', { durable: true }) // durable = survives broker restart
await channel.sendToQueue('email-jobs',
Buffer.from(JSON.stringify({ to: 'user@example.com', template: 'welcome' })),
{ persistent: true } // persistent = message survives broker restart
)
// Worker (consumer)
await channel.assertQueue('email-jobs', { durable: true })
channel.prefetch(1) // Process one message at a time per worker
await channel.consume('email-jobs', async (msg) => {
if (!msg) return
const job = JSON.parse(msg.content.toString())
try {
await sendEmail(job)
channel.ack(msg) // Acknowledge: remove from queue
} catch {
channel.nack(msg, false, false) // Reject and send to dead-letter queue
}
})
Dead Letter Queues (DLQ)
A DLQ captures messages that failed processing after all retries. Without DLQs, failed messages either block the queue (if unacked) or are silently lost. Always configure a DLQ in production:
# SQS example (AWS CDK)
const dlq = new Queue(this, 'EmailDLQ', { retentionPeriod: Duration.days(14) })
const emailQueue = new Queue(this, 'EmailQueue', {
deadLetterQueue: { queue: dlq, maxReceiveCount: 3 }, // After 3 failed attempts → DLQ
})
Event Sourcing and CQRS
Event sourcing stores the history of state changes as an immutable sequence of events, rather than the current state. CQRS (Command Query Responsibility Segregation) separates write operations (commands → events) from read operations (queries → projections/read models).
// Event sourcing: instead of UPDATE accounts SET balance = 150 WHERE id = 'acc1'
// Store the event:
const event = {
type: 'MONEY_DEPOSITED',
accountId: 'acc1',
amount: 50,
previousBalance: 100,
newBalance: 150,
timestamp: new Date().toISOString(),
correlationId: requestId,
}
await eventStore.append('account-acc1', event)
// Current state is derived by replaying events
async function getCurrentBalance(accountId: string): Promise<number> {
const events = await eventStore.getEvents(`account-${accountId}`)
return events.reduce((balance, event) => {
if (event.type === 'MONEY_DEPOSITED') return balance + event.amount
if (event.type === 'MONEY_WITHDRAWN') return balance - event.amount
return balance
}, 0)
}
Saga Pattern: Distributed Transactions
Distributed transactions spanning multiple services cannot use ACID database transactions. The saga pattern breaks a distributed transaction into a sequence of local transactions, each publishing an event or calling the next service. If a step fails, compensating transactions undo the preceding steps.
Orchestration Saga
A central orchestrator service coordinates the saga steps and handles failures:
// Order placement saga (orchestration style)
class OrderSaga {
async execute(orderId: string) {
try {
// Step 1: Reserve inventory
await inventoryService.reserve({ orderId, items: order.items })
// Step 2: Process payment
const payment = await paymentService.charge({ orderId, amount: order.total })
// Step 3: Create shipment
await shippingService.createShipment({ orderId, paymentId: payment.id })
await orderService.markCompleted(orderId)
} catch (error) {
// Compensating transactions — run in reverse order
await shippingService.cancelShipment(orderId).catch(() => {})
await paymentService.refund(orderId).catch(() => {})
await inventoryService.release(orderId).catch(() => {})
await orderService.markFailed(orderId, error.message)
}
}
}
Circuit Breaker Pattern
A circuit breaker monitors calls to a downstream service and "opens" (stops making calls) when the failure rate exceeds a threshold. This prevents cascade failures where one slow service causes all callers to accumulate threads waiting for timeouts.
// Node.js: opossum circuit breaker library
import CircuitBreaker from 'opossum'
const options = {
timeout: 3000, // Calls failing after 3s count as failures
errorThresholdPercentage: 50, // Open circuit if >50% of calls fail
resetTimeout: 30000, // Try again after 30s (half-open state)
}
const breaker = new CircuitBreaker(
(userId: string) => paymentService.getBalance(userId),
options
)
breaker.fallback((userId: string) => ({
userId,
balance: null,
fromCache: true,
error: 'Payment service unavailable',
}))
// Circuit states: CLOSED (normal) → OPEN (failing) → HALF-OPEN (testing recovery)
const result = await breaker.fire(userId)
Service Discovery
In Kubernetes, service discovery is built-in via DNS: services are reachable at service-name.namespace.svc.cluster.local. No additional tooling needed. For non-Kubernetes environments (VMs, mixed cloud), tools like Consul or Eureka provide service registration and health checking:
# Kubernetes service DNS resolution
# The order-service Service is accessible at:
http://order-service # Same namespace
http://order-service.production # Cross-namespace shorthand
http://order-service.production.svc.cluster.local # Fully qualified