How to Scale a SaaS Platform From 100 to 100,000 Users
A practical, stage-by-stage guide to scaling a SaaS product — what to do at 1k, 10k, and 100k users, where the real bottlenecks are, and when to invest in infrastructure vs product.
TL;DR
- The database is almost always your first bottleneck — profile queries and add indexes before buying bigger servers
- Make your application stateless from day one — this is the key enabler of horizontal scaling
- Add Redis caching at 1k–5k users, read replicas at 10k–20k users
- Use a CDN (Cloudflare) for static assets from day one — it's free and dramatically reduces server load
- Move long-running work to background job queues — never do it in an API request
The Golden Rule: Don't Optimise Before You Need To
Premature optimisation is the biggest scaling mistake. Many SaaS products invest weeks in distributed systems and microservices architecture before they have 1,000 users — and never need them. Build a simple, well-structured monolith first. Optimise when you have measured evidence of a bottleneck, not when you imagine one.
The scaling stages below tell you what the real problems are at each tier — and what actually fixes them.
Stage 1: 0–1,000 Users — Survive Launch
Primary concern: Does the product work correctly? Can users sign up, pay, and complete the core workflow?
- Single application server (1–2 vCPUs, 2–4GB RAM)
- Managed PostgreSQL (smallest tier — 1 vCPU, 1–2GB)
- Cloudflare CDN for static assets and DDoS protection (free tier)
- Sentry for error tracking — you need to know when things break
- Basic request logging so you can debug production issues
- Infrastructure cost: ~£100–£200/month
Stage 2: 1,000–10,000 Users — Performance Matters
Primary concern: Slow pages and API timeouts. The database is now feeling the load.
- Profile slow queries: Enable PostgreSQL's
pg_stat_statementsand identify queries over 100ms. Most slow queries have a missing index. - Add Redis caching: Cache expensive, frequently-accessed reads (dashboard aggregates, plan data, user profile) with a 60–300 second TTL.
- Background jobs: Move email sending, PDF generation, AI inference, and webhook delivery to Celery workers. API responses stay under 200ms.
- Upgrade database instance: Move to 2–4 vCPU PostgreSQL. Tune shared_buffers and work_mem.
- Infrastructure cost: ~£400–£800/month
Stage 3: 10,000–50,000 Users — Horizontal Scaling
Primary concern: Single application server is CPU/memory bound during peak hours.
- Horizontal application scaling: Deploy 3–5 application server instances behind an AWS ALB or Nginx load balancer. This is only possible if your app is stateless (no server-side session storage).
- Read replica for PostgreSQL: Route all SELECT queries that don't need to be instantly consistent (reports, dashboard, list views) to a read replica. Reduces primary database load by 40–70%.
- Auto-scaling groups: Set CPU threshold rules — automatically add instances when CPU > 70%, remove when CPU < 30%.
- Connection pooling: Add PgBouncer (transaction mode) between app servers and PostgreSQL. Prevents connection exhaustion with many app instances.
- Infrastructure cost: ~£1,000–£3,000/month
Stage 4: 50,000–100,000 Users — Distributed Systems
Primary concern: Specific features or services become bottlenecks; monolith can't scale parts independently.
- Extract hot services: If one feature (e.g., AI processing, media handling) dominates server load, extract it to a separate service that can scale independently.
- Database sharding or multi-region: If you serve multiple geographies, consider multi-region deployments with regional PostgreSQL replicas to reduce latency.
- Dedicated worker fleet: Scale background job workers separately from API servers based on queue depth.
- CDN for API responses: Cache public API responses (e.g., public listing pages) at the CDN edge for sub-10ms response times globally.
- Infrastructure cost: ~£5,000–£15,000/month
The Scaling Decisions That Matter Most
| Decision | Impact | When |
|---|---|---|
| Stateless application servers | Enables horizontal scaling — must be decided at build time | Day 1 |
| Database indexes on query-heavy columns | 10–100× query speedup; free performance gain | 1k–5k users |
| Redis caching for expensive reads | Reduces DB load 30–60% for read-heavy operations | 1k–5k users |
| Background job queues | Keeps API response times fast; prevents timeouts | Before launch |
| Read replica | Removes 40–70% of read load from primary | 10k–20k users |
| Horizontal app scaling | Linear throughput increase with server count | 10k–20k users |
The Most Common Scaling Mistakes
Buying bigger servers instead of fixing the code
Vertical scaling (bigger machine) is 10× more expensive than fixing a missing index or query N+1 problem. Profile first, scale second. One missing index can make a query 100× faster — no infrastructure change needed.
Doing heavy work in API requests
Any operation that takes more than 500ms (AI inference, PDF generation, report aggregation, sending emails) must go into a background job. Synchronous long-running requests hold connections, exhaust thread pools, and cause timeouts under load.
Storing session state in application memory
If your application stores user sessions in memory, you can't scale horizontally — a user's next request might go to a different server with no knowledge of their session. Use JWT tokens or Redis-stored sessions from day one.
Need to Scale Your SaaS Product?
We help SaaS teams diagnose performance bottlenecks, optimise database queries, implement caching, and architect for horizontal scale — without over-engineering.
Book a Performance Review