When should a SaaS product start investing in scaling infrastructure?

Start optimising database queries and adding basic Redis caching at 1,000–5,000 users. Add read replicas and horizontal application scaling at 10,000–20,000 users. Invest in a full distributed architecture only when your existing infrastructure is measurably failing to handle load — don't over-engineer ahead of need.

What is the most common bottleneck in a growing SaaS product?

The database is almost always the first bottleneck. Specifically: missing indexes on frequently queried columns, N+1 query problems in ORM code, and large joins on unoptimised tables. Profile your slow queries first before adding infrastructure.

How do you horizontally scale a SaaS backend?

To horizontally scale, the application servers must be stateless — no session data stored in memory. All state goes to PostgreSQL (persistent) or Redis (ephemeral). Then put a load balancer in front of multiple application instances. Add or remove instances based on CPU/memory metrics.

How to Scale a SaaS Platform From 100 to 100,000 Users

Last updated: 2026-05-23

A practical, stage-by-stage guide to scaling a SaaS product — what to do at 1k, 10k, and 100k users, where the real bottlenecks are, and when to invest in infrastructure vs product.

By SpiderHunts Technologies · 23 May 2026 · 10 min read

TL;DR

The database is almost always your first bottleneck — profile queries and add indexes before buying bigger servers
Make your application stateless from day one — this is the key enabler of horizontal scaling
Add Redis caching at 1k–5k users, read replicas at 10k–20k users
Use a CDN (Cloudflare) for static assets from day one — it's free and dramatically reduces server load
Move long-running work to background job queues — never do it in an API request

The Golden Rule: Don't Optimise Before You Need To

Premature optimisation is the biggest scaling mistake. Many SaaS products invest weeks in distributed systems and microservices architecture before they have 1,000 users. They never need them. Build a simple, well-structured monolith first. Optimise when you have measured evidence of a bottleneck, not when you imagine one.

The scaling stages below tell you what the real problems are at each tier — and what actually fixes them.

Stage 1: 0–1,000 Users — Survive Launch

Primary concern: Does the product work correctly? Can users sign up, pay, and complete the core workflow?

Single application server (1–2 vCPUs, 2–4GB RAM)
Managed PostgreSQL (smallest tier — 1 vCPU, 1–2GB)
Cloudflare CDN for static assets and DDoS protection (free tier)
Sentry for error tracking — you need to know when things break
Basic request logging so you can debug production issues
Infrastructure cost: ~£100–£200/month

Stage 2: 1,000–10,000 Users — Performance Matters

Primary concern: Slow pages and API timeouts. The database is now feeling the load.

Profile slow queries: Enable PostgreSQL's pg_stat_statements and identify queries over 100ms. Most slow queries have a missing index.
Add Redis caching: Cache expensive, frequently-accessed reads (dashboard aggregates, plan data, user profile) with a 60–300 second TTL.
Background jobs: Move email sending, PDF generation, AI inference, and webhook delivery to Celery workers. API responses stay under 200ms.
Upgrade database instance: Move to 2–4 vCPU PostgreSQL. Tune shared_buffers and work_mem.
Infrastructure cost: ~£400–£800/month

Stage 3: 10,000–50,000 Users — Horizontal Scaling

Primary concern: Single application server is CPU/memory bound during peak hours.

Horizontal application scaling: Deploy 3–5 application server instances behind an AWS ALB or Nginx load balancer. This is only possible if your app is stateless (no server-side session storage).
Read replica for PostgreSQL: Route all SELECT queries that don't need to be instantly consistent (reports, dashboard, list views) to a read replica. Reduces primary database load by 40–70%.
Auto-scaling groups: Set CPU threshold rules — automatically add instances when CPU > 70%, remove when CPU < 30%.
Connection pooling: Add PgBouncer (transaction mode) between app servers and PostgreSQL. Prevents connection exhaustion with many app instances.
Infrastructure cost: ~£1,000–£3,000/month

Stage 4: 50,000–100,000 Users — Distributed Systems

Primary concern: Specific features or services become bottlenecks; monolith can't scale parts independently.

Extract hot services: If one feature (e.g., AI processing, media handling) dominates server load, extract it to a separate service that can scale independently.
Database sharding or multi-region: If you serve multiple geographies, consider multi-region deployments with regional PostgreSQL replicas to reduce latency.
Dedicated worker fleet: Scale background job workers separately from API servers based on queue depth.
CDN for API responses: Cache public API responses (e.g., public listing pages) at the CDN edge for sub-10ms response times globally.
Infrastructure cost: ~£5,000–£15,000/month

The Scaling Decisions That Matter Most

Decision	Impact	When
Stateless application servers	Enables horizontal scaling — must be decided at build time	Day 1
Database indexes on query-heavy columns	10–100× query speedup; free performance gain	1k–5k users
Redis caching for expensive reads	Reduces DB load 30–60% for read-heavy operations	1k–5k users
Background job queues	Keeps API response times fast; prevents timeouts	Before launch
Read replica	Removes 40–70% of read load from primary	10k–20k users
Horizontal app scaling	Linear throughput increase with server count	10k–20k users

The Most Common Scaling Mistakes

Buying bigger servers instead of fixing the code

Vertical scaling (bigger machine) is 10× more expensive than fixing a missing index or query N+1 problem. Profile first, scale second. One missing index can make a query 100× faster — no infrastructure change needed.

Doing heavy work in API requests

Any operation that takes more than 500ms (AI inference, PDF generation, report aggregation, sending emails) must go into a background job. Synchronous long-running requests hold connections, exhaust thread pools, and cause timeouts under load.

Storing session state in application memory

If your application stores user sessions in memory, you can't scale horizontally. A user's next request might go to a different server with no knowledge of their session. Use JWT tokens or Redis-stored sessions from day one.

Need to Scale Your SaaS Product?

We help SaaS teams diagnose performance bottlenecks, optimise database queries, implement caching, and architect for horizontal scale — without over-engineering.

Book a Performance Review

How to Scale a SaaS Platform From 100 to 100,000 Users

The Golden Rule: Don't Optimise Before You Need To

Stage 1: 0–1,000 Users — Survive Launch

Stage 2: 1,000–10,000 Users — Performance Matters

Stage 3: 10,000–50,000 Users — Horizontal Scaling

Stage 4: 50,000–100,000 Users — Distributed Systems

The Scaling Decisions That Matter Most

The Most Common Scaling Mistakes

Buying bigger servers instead of fixing the code

Doing heavy work in API requests

Storing session state in application memory

Need to Scale Your SaaS Product?

Related Articles & Services

Continue reading

Email Infrastructure for SaaS 2026: Resend vs Postmark vs SendGrid vs AWS SES

Observability Stack for SaaS 2026: Datadog vs New Relic vs Grafana vs Sentry

Search Infrastructure for SaaS 2026: Algolia vs Typesense vs Meilisearch vs Elasticsearch

Custom CRM Development: When to Build Your Own vs Salesforce or HubSpot in 2026