How a stateless Rust WebSocket cluster does multi-tenant real-time messaging without losing a single message or letting Tenant A see a byte of Tenant B’s traffic.

Live: yappa.perceptionlabs.tech


TL;DR

Real-time chat is one of those problems that looks trivial in a tutorial (socket.on('message', cb)) and turns into a distributed systems exam the moment you need more than one server. Yappa-RT is the answer to “what does that exam answer actually look like in production”: a stateless Rust cluster, Redis Pub/Sub for cross-node fanout, Kafka for durability, Postgres for storage, and a Node auth service that the messaging layer doesn’t have to trust with anything beyond a signed token.

Every design decision below exists because the naive version broke under one specific condition. That’s the structure of this doc — problem, naive answer, why it falls over, what we did instead.


1. The actual hard part: WebSockets don’t scale the way HTTP does

An HTTP request is stateless — any server behind the load balancer can answer it. A WebSocket connection is pinned. The TCP socket lives on exactly one machine, in exactly one process’s memory, for the lifetime of the connection. The instant you run more than one server, you’ve created a routing problem that didn’t exist before: user A is connected to node 1, user B is connected to node 2, and when A sends B a message, node 1 has no idea node 2 exists, let alone that B is sitting on it.

Most people’s first instinct is sticky sessions — pin a user to a node at the load balancer level and never let them move. This works until it doesn’t: it breaks horizontal autoscaling (you can’t rebalance load without dropping connections), it creates hot nodes (one viral tenant overloads one box while the rest sit idle), and it turns every deploy into a mass-disconnect event.

Yappa-RT doesn’t pin anything. Every node is identical, stateless, and disposable. The load balancer can send any connection to any node, at any time, and the system still delivers every message correctly. That property is the entire reason the rest of the architecture looks the way it does.

                      Load Balancer (Nginx / ALB)
                               │
        ┌──────────────────────┼──────────────────────┐
        ▼                      ▼                      ▼
   yappa-rt :8080         yappa-rt :8080         yappa-rt :8080
     (Rust)                  (Rust)                  (Rust)
        │                      │                      │
        └──────────────────────┼──────────────────────┘
                               │
        ┌──────────────────────┼──────────────────────┐
        ▼                      ▼                      ▼
     Redis :6379         yappa-auth :3001         PostgreSQL :5432
   (Pub/Sub, limits)        (Node.js)                (storage)
        │
        ▼
     Kafka :9092  (optional — PERSISTENCE_MODE=kafka)

2. Why Rust, specifically

Not a religious argument — a practical one. This workload is thousands of long-lived concurrent connections, each one mostly idle, occasionally bursting. That’s exactly the profile where:

The trade-off is real: slower to write, a borrow checker that argues with you, fewer “just npm install it” shortcuts. For a connection-dense, long-running service where a memory leak or a GC stutter shows up as a customer-facing outage, that trade was worth making.