Should a SaaS startup use managed databases (RDS) or self-hosted?

For almost all SaaS startups, managed databases (AWS RDS, Azure Database for PostgreSQL, GCP Cloud SQL) are the right choice. Managed databases provide automated backups, automated failover (Multi-AZ), automated minor version patching, and monitoring — tasks that would require a dedicated DBA or senior engineer to handle on a self-hosted database. The premium over self-hosted is typically 30 to 50% on compute, but the operational time saved is worth far more than that cost.

Should we use Terraform or Pulumi for SaaS infrastructure?

Both are excellent choices. Terraform is the more established option with the largest community, most AWS provider resources, and most examples. Pulumi is better if your team prefers writing infrastructure code in Python or TypeScript rather than HCL (Terraform's language). For most teams, Terraform is the pragmatic choice. Whichever you choose, start using it from day one — retrofitting infrastructure-as-code onto a manually created AWS account is painful.

How should a SaaS startup structure AWS accounts?

The recommended structure is AWS Organizations with separate accounts per environment: a management/root account (billing and IAM identity only), a production account, a staging account, and a development account. This provides billing isolation, blast radius reduction (a misconfiguration in dev cannot affect prod), and clear cost attribution per environment. AWS Control Tower automates this multi-account setup with security guardrails.

What is the minimum viable monitoring setup for a SaaS startup?

At minimum: uptime monitoring (Pingdom, Better Uptime, or AWS Route 53 health checks), error tracking (Sentry — free tier is generous), and structured application logging to CloudWatch Logs. Add CloudWatch alarms for high error rates, database CPU above 80%, and memory exhaustion. Sentry integrations with Slack give you instant notifications when new errors occur in production. This setup takes a day to implement and is valuable from your first production customer.

Cloud Engineering for SaaS Startups

TL;DR: Start with managed services (RDS not self-hosted Postgres), infrastructure-as-code from day one (Terraform), environment separation (dev/staging/prod accounts), secrets in Secrets Manager (never in.env files), structured logging, uptime monitoring, and budget alerts. A well-built startup cloud stack costs £300 to £600/month to start and scales gracefully. A poorly built one costs the same but turns into a painful rebuild six months later.

Why Getting Infrastructure Right From Day One Matters

We frequently work with SaaS startups that have landed their first major customer and need to pass a security audit. They then discover that their infrastructure was clicked together in the AWS console, with the following problems:

no automation
no environment separation
database credentials hardcoded in environment variables
no backups tested
no monitoring

The "quick fix" that felt pragmatic at the start is now a month-long refactoring project happening in parallel with serving paying customers.

The time investment to do infrastructure properly from day one is two to three weeks for an experienced cloud engineer. The time to fix it properly later, under production pressure, is three to eight weeks. The choice is straightforward if you know what "properly" means — which this guide will explain.

Choose Managed Services Over Self-Managed

The most impactful decision for a SaaS startup's infrastructure is choosing managed services at every layer. Self-hosting a component means you own:

initial setup
OS patching
monitoring
backup
failover
version upgrades
incident response

For a small team, each self-managed component is an operational burden that competes with shipping product.

Layer	Self-Managed	Managed Service	Recommendation
PostgreSQL	EC2 + self-managed Postgres	AWS RDS / Aurora Serverless	Managed (RDS)
Redis	EC2 + self-managed Redis	AWS ElastiCache for Redis	Managed (ElastiCache)
Application servers	EC2 instances	ECS Fargate / EKS	ECS Fargate (for most startups)
Load balancer	Nginx on EC2	AWS ALB	Managed (ALB)
SSL/TLS certificates	Let's Encrypt + cron renewal	AWS ACM (auto-renewal)	Managed (ACM)
Message queue	RabbitMQ on EC2	AWS SQS	Managed (SQS)
File storage	EBS / EFS on EC2	AWS S3	Managed (S3)
DNS	Self-managed BIND / CoreDNS	Route 53	Managed (Route 53)
Secrets	.env files, SSM Parameter Store	AWS Secrets Manager	Managed (Secrets Manager)
Container registry	Docker Hub	AWS ECR	Managed (ECR)

Infrastructure-as-Code From Day One

Never create AWS resources by clicking in the console — or if you do for exploration, immediately codify them in Terraform. Infrastructure-as-code (IaC) gives you:

a version-controlled audit log of every infrastructure change
reproducible environments (staging is identical to production, just smaller)
peer review for infrastructure changes
the ability to tear down and recreate environments in minutes

Use Terraform modules to avoid repeating yourself. A good module structure:

a vpc module (VPC, subnets, routing, NAT Gateway)
an ecs-service module (ECS task definition, service, ALB target group, security groups, autoscaling)
a rds module (RDS instance, subnet group, parameter group, security group)
a secrets module (Secrets Manager secrets with rotation)

Terraform: Complete 3-Tier SaaS Infrastructure

# main.tf — Complete SaaS startup infrastructure (simplified)
# Assumes AWS provider configured with eu-west-2 (London)

terraform {
 required_providers {
 aws = { source = "hashicorp/aws", version = "~> 5.0" }
 }
 backend "s3" {
 bucket = "myapp-terraform-state"
 key = "prod/terraform.tfstate"
 region = "eu-west-2"
 }
}

provider "aws" { region = "eu-west-2" }

# ─── VPC ───────────────────────────────────────────────────────
module "vpc" {
 source = "terraform-aws-modules/vpc/aws"
 version = "~> 5.0"

 name = "myapp-prod"
 cidr = "10.0.0.0/16"

 azs = ["eu-west-2a", "eu-west-2b", "eu-west-2c"]
 private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
 public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
 database_subnets = ["10.0.201.0/24", "10.0.202.0/24", "10.0.203.0/24"]

 enable_nat_gateway = true
 single_nat_gateway = false # one per AZ for HA
 enable_dns_hostnames = true
 enable_dns_support = true

 create_database_subnet_group = true

 tags = {
 Project = "myapp"
 Environment = "prod"
 ManagedBy = "terraform"
 }
}

# ─── ALB (Application Load Balancer) ──────────────────────────
resource "aws_lb" "main" {
 name = "myapp-prod-alb"
 internal = false
 load_balancer_type = "application"
 security_groups = [aws_security_group.alb.id]
 subnets = module.vpc.public_subnets

 enable_deletion_protection = true

 tags = { Name = "myapp-prod-alb" }
}

resource "aws_lb_listener" "https" {
 load_balancer_arn = aws_lb.main.arn
 port = "443"
 protocol = "HTTPS"
 ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
 certificate_arn = aws_acm_certificate_validation.main.certificate_arn

 default_action {
 type = "forward"
 target_group_arn = aws_lb_target_group.app.arn
 }
}

resource "aws_lb_listener" "http_redirect" {
 load_balancer_arn = aws_lb.main.arn
 port = "80"
 protocol = "HTTP"

 default_action {
 type = "redirect"
 redirect {
 port = "443"
 protocol = "HTTPS"
 status_code = "HTTP_301"
 }
 }
}

# ─── RDS (PostgreSQL) ─────────────────────────────────────────
resource "aws_db_instance" "main" {
 identifier = "myapp-prod-postgres"
 engine = "postgres"
 engine_version = "16.2"
 instance_class = "db.t4g.medium"
 allocated_storage = 100
 storage_type = "gp3"
 storage_encrypted = true
 kms_key_id = aws_kms_key.rds.arn

 db_name = "myapp"
 username = "myapp_admin"
 password = random_password.db_password.result

 db_subnet_group_name = module.vpc.database_subnet_group
 vpc_security_group_ids = [aws_security_group.rds.id]

 multi_az = true # HA in production
 backup_retention_period = 30 # 30 days of automated backups
 backup_window = "02:00-04:00"
 maintenance_window = "Mon:04:00-Mon:06:00"

 deletion_protection = true
 skip_final_snapshot = false
 final_snapshot_identifier = "myapp-prod-final-snapshot"

 performance_insights_enabled = true

 tags = { Name = "myapp-prod-postgres" }
}

# ─── ElastiCache (Redis) ─────────────────────────────────────
resource "aws_elasticache_replication_group" "redis" {
 replication_group_id = "myapp-prod-redis"
 description = "Redis for session cache and Celery broker"

 node_type = "cache.t4g.small"
 num_cache_clusters = 2 # primary + one replica
 port = 6379

 subnet_group_name = aws_elasticache_subnet_group.main.name
 security_group_ids = [aws_security_group.redis.id]

 at_rest_encryption_enabled = true
 transit_encryption_enabled = true
 auth_token = random_password.redis_token.result

 tags = { Name = "myapp-prod-redis" }
}

# ─── ECS Fargate (Application) ───────────────────────────────
resource "aws_ecs_cluster" "main" {
 name = "myapp-prod"

 setting {
 name = "containerInsights"
 value = "enabled"
 }
}

resource "aws_ecs_task_definition" "api" {
 family = "myapp-api"
 requires_compatibilities = ["FARGATE"]
 network_mode = "awsvpc"
 cpu = 512
 memory = 1024
 execution_role_arn = aws_iam_role.ecs_execution.arn
 task_role_arn = aws_iam_role.ecs_task.arn

 container_definitions = jsonencode([{
 name = "api"
 image = "${aws_ecr_repository.api.repository_url}:latest"
 essential = true

 portMappings = [{ containerPort = 8000, protocol = "tcp" }]

 environment = [
 { name = "ENVIRONMENT", value = "production" },
 { name = "REDIS_URL", value = "rediss://:${random_password.redis_token.result}@${aws_elasticache_replication_group.redis.primary_endpoint_address}:6379/0" }
 ]

 secrets = [
 { name = "DATABASE_URL", valueFrom = aws_secretsmanager_secret.db_url.arn },
 { name = "SECRET_KEY", valueFrom = aws_secretsmanager_secret.app_secret.arn }
 ]

 logConfiguration = {
 logDriver = "awslogs"
 options = {
 "awslogs-group" = aws_cloudwatch_log_group.api.name
 "awslogs-region" = "eu-west-2"
 "awslogs-stream-prefix" = "api"
 }
 }

 healthCheck = {
 command = ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"]
 interval = 30
 timeout = 5
 retries = 3
 startPeriod = 60
 }
 }])
}

resource "aws_ecs_service" "api" {
 name = "api"
 cluster = aws_ecs_cluster.main.id
 task_definition = aws_ecs_task_definition.api.arn
 desired_count = 2
 launch_type = "FARGATE"

 network_configuration {
 subnets = module.vpc.private_subnets
 security_groups = [aws_security_group.app.id]
 assign_public_ip = false
 }

 load_balancer {
 target_group_arn = aws_lb_target_group.app.arn
 container_name = "api"
 container_port = 8000
 }

 deployment_circuit_breaker {
 enable = true
 rollback = true
 }

 lifecycle { ignore_changes = [task_definition] }
}

Environment Setup: Dev / Staging / Production

Use AWS Organizations with separate accounts for each environment. Production gets its own account — isolated billing, separate IAM, separate network. Developers have access to the development account, and senior engineers to staging. Only the CI/CD pipeline (and on-call engineers) can deploy to production. This prevents the classic "I was debugging in prod" incident.

Apply consistent resource tagging across all environments: Project, Environment, Team, ManagedBy. Tags enable cost allocation reports that show exactly how much each environment and component costs — essential for FinOps as you scale.

CI/CD Pipeline: GitHub Actions + ECR + ECS

A minimal but production-ready CI/CD pipeline for a containerised SaaS application: on pull request, run tests and build the Docker image. On merge to main, push the image to ECR with a git SHA tag. Then run database migrations in a one-off ECS task. Finally, update the ECS service with the new image tag. ECS's deployment circuit breaker automatically rolls back if the new tasks fail their health checks.

Use GitHub Actions OIDC to authenticate to AWS without storing long-lived access keys in GitHub secrets. This uses IAM Identity Provider configuration to allow GitHub Actions to assume an IAM role using short-lived tokens. This is significantly more secure than access key rotation.

Secrets Management

The cardinal rule: never store credentials in environment variables,.env files committed to git, or EC2 user data scripts. These all appear in logs, are visible to anyone with access to the repository, and cannot be rotated without a deployment.

Use AWS Secrets Manager from day one. Store your database URL, API keys, third-party service credentials, and application secret keys here. Reference them in ECS task definitions as secret environment variables. ECS retrieves and injects them at runtime, and they never appear in your code or infrastructure configuration. Enable automatic rotation for your database password — AWS Secrets Manager handles this natively for RDS.

Monitoring From Day One

Structured logging: configure your application to output JSON-formatted logs (not plain text). JSON logs are structured, searchable, and filterable in CloudWatch Logs Insights. Add fields like request_id, user_id, duration_ms, status_code, and error_message to every log line. This turns your logs from a wall of text into a queryable database of application events.

Uptime monitoring: set up an external uptime monitor (Pingdom, Better Uptime, or AWS Route 53 health checks). It should alert immediately if your production URL becomes unreachable. Internal monitoring can't tell you the service is down if the monitoring system itself is affected by the same outage.

Error tracking: integrate Sentry into your application from day one. Sentry captures unhandled exceptions with full stack traces, breadcrumbs, user context, and release tracking. The free tier covers up to 5,000 errors per month — more than enough for a startup. Alert to Slack immediately on new issues.

CloudWatch alarms: configure alarms for:

ECS CPU above 80% (sustained 5 min)
ECS memory above 85%
RDS CPU above 80%
RDS storage below 20% free
ALB 5xx error rate above 1%
SQS dead-letter queue depth above 0 (any DLQ message means a failed job)

Cost Control for Startups

Budget alerts: set up AWS Budgets with alerts at 80% and 100% of your expected monthly spend. Receive alerts via email and SNS. This is a five-minute setup that prevents bill shock.

Auto-stop dev environments: development databases and ECS services don't need to run 24/7. Use EventBridge Scheduler to stop RDS instances and scale ECS desired count to 0 at 7pm on weekdays. Then restart them at 8am. For a typical dev environment costing £200/month on-demand, this saves approximately £130/month (running only 55 hours per week instead of 168).

Right-size from the start: resist the urge to over-provision "just in case". Start with the smallest instance that meets your needs based on load testing, not gut feel. It is easy to scale up. It is psychologically harder to scale down a resource you already provisioned (because you worry about what might break).

Building a SaaS? Get Your Infrastructure Right From Day One.

SpiderHunts Technologies sets up complete, production-ready cloud infrastructure for SaaS startups — Terraform, CI/CD, managed databases, monitoring, secrets management, and security. Done in 2 to 3 weeks. Ready to scale with you.

Talk to a Cloud Engineer

Cloud Cloud Migration Guide: How to Move Your Business to the Cloud Cloud AWS vs Azure vs Google Cloud: Which Is Right for Your Cloud How to Migrate a Legacy Application to the Cloud Without

💻 More in SaaS & Software Development

Cloud Engineering for SaaS Startups: Infrastructure From Day One

Why Getting Infrastructure Right From Day One Matters

Choose Managed Services Over Self-Managed

Infrastructure-as-Code From Day One

Terraform: Complete 3-Tier SaaS Infrastructure

Environment Setup: Dev / Staging / Production

CI/CD Pipeline: GitHub Actions + ECR + ECS

Secrets Management

Monitoring From Day One

Cost Control for Startups

Building a SaaS? Get Your Infrastructure Right From Day One.

Related Articles

Continue reading

Email Infrastructure for SaaS 2026: Resend vs Postmark vs SendGrid vs AWS SES

Observability Stack for SaaS 2026: Datadog vs New Relic vs Grafana vs Sentry

Search Infrastructure for SaaS 2026: Algolia vs Typesense vs Meilisearch vs Elasticsearch

Custom CRM Development: When to Build Your Own vs Salesforce or HubSpot in 2026