AWS Deployment Guide¶
Single EC2 instance deployment running Next.js, FastAPI, PostgreSQL, and Nginx.
Target: t3.small (2 vCPU, 2GB RAM) — suitable for up to ~25 concurrent users.
Architecture¶
┌──────────────────────────────────────┐
│ EC2 t3.small (Docker Compose) │
│ │
│ ┌──────────┐ ┌──────────────────┐ │
│ │ Next.js │ │ FastAPI │ │
│ │ :3000 │ │ :9898 │ │
│ └────┬─────┘ └───────┬─────────┘ │
│ │ │ │
│ ┌────┴────────────────┴──────────┐ │
│ │ Nginx (:80 / :443) │ │
│ └────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────┐ │
│ │ PostgreSQL :5432 │ │
│ │ (containerized, local-only) │ │
│ └────────────────────────────────┘ │
└──────────────────────────────────────┘
Prerequisites¶
- AWS account with EC2 and S3 access
- A domain name (optional, but needed for SSL)
- SSH key pair for EC2
EC2 Instance Setup¶
1. Launch Instance¶
- AMI: Amazon Linux 2023 or Ubuntu 22.04
- Type:
t3.small - Storage: 30GB gp3
- Security Group:
- SSH (22) — your IP only
- HTTP (80) — 0.0.0.0/0
- HTTPS (443) — 0.0.0.0/0
2. Install Docker¶
# Amazon Linux 2023
sudo dnf update -y
sudo dnf install -y docker git
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker ec2-user
# Install Docker Compose plugin
sudo mkdir -p /usr/local/lib/docker/cli-plugins
sudo curl -SL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64 \
-o /usr/local/lib/docker/cli-plugins/docker-compose
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose
# Install Docker Buildx (required for multi-stage builds)
sudo curl -SL https://github.com/docker/buildx/releases/latest/download/buildx-v0.20.1.linux-amd64 \
-o /usr/local/lib/docker/cli-plugins/docker-buildx
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-buildx
# Log out and back in for group changes
exit
3. Clone and Configure¶
git clone <backend-repo-url> ~/ai-tutor-backend
git clone <frontend-repo-url> ~/ai-tutor-ui
cd ~/ai-tutor-backend/deploy
cp .env.production.example .env.production
Edit .env.production with your actual values:
Required changes:
- POSTGRES_PASSWORD — strong random password
- SECRET_KEY — openssl rand -hex 32
- BACKEND_CORS_ORIGINS — your domain
- ANTHROPIC_API_KEY or Bedrock config
- BACKUP_S3_BUCKET — your S3 bucket name
4. Deploy¶
Important: The env file is named .env.production (not .env), so --env-file .env.production is required on every docker compose command.
Verify everything is running:
5. Run Database Migrations¶
6. Set Up Backups (Optional — requires S3 bucket)¶
Skip this step if you don't have an S3 bucket configured yet. The app runs fine without backups — you can set this up later.
# Install AWS CLI
sudo dnf install -y aws-cli
# Configure credentials
aws configure
# Test backup
cd ~/ai-tutor-backend/deploy
source .env.production
chmod +x scripts/backup-postgres.sh
./scripts/backup-postgres.sh
# Schedule daily backup at 3 AM
(crontab -l 2>/dev/null; echo "0 3 * * * cd ~/ai-tutor-backend/deploy && source .env.production && ./scripts/backup-postgres.sh >> /var/log/pg-backup.log 2>&1") | crontab -
7. SSL with Let's Encrypt (optional but recommended)¶
sudo dnf install -y certbot
sudo certbot certonly --standalone -d your-domain.com
# Then uncomment the SSL sections in:
# - deploy/nginx/nginx.conf (the server blocks at the bottom)
# - deploy/docker-compose.yml (port 443 and letsencrypt volume)
docker compose --env-file .env.production restart nginx
Common Operations¶
View logs¶
docker compose --env-file .env.production logs -f backend
docker compose --env-file .env.production logs -f frontend
docker compose --env-file .env.production logs -f postgres
Restart a service¶
Redeploy after code changes¶
Automatic (default): Merge or push to staging branch — GitHub Actions auto-deploys to EC2. No manual steps needed.
Manual fallback (if Actions is unavailable):
cd ~/ai-tutor-backend && git checkout staging && git pull
cd ~/ai-tutor-ui && git checkout staging && git pull
cd ~/ai-tutor-backend/deploy
docker compose --env-file .env.production up -d --build
docker compose --env-file .env.production exec -T backend alembic upgrade head
Manual database backup¶
Restore from backup¶
aws s3 cp s3://your-bucket/backups/postgres/ai_tutor_20260217.sql.gz /tmp/
gunzip /tmp/ai_tutor_20260217.sql.gz
docker compose --env-file .env.production exec -T postgres psql -U ai_tutor -d ai_tutor < /tmp/ai_tutor_20260217.sql
Monitoring¶
Health check: curl http://3.151.25.120/health
Metrics (Prometheus): curl http://3.151.25.120/metrics
Resource usage: docker stats
Cost Estimate¶
| Item | Monthly |
|---|---|
EC2 t3.small on-demand |
~$15 |
| 30GB EBS gp3 | ~$2.40 |
| S3 backups | ~$0.50 |
| Total | ~$18/mo |
With 1-year Reserved Instance: ~$11/mo
Future Expansion Guide¶
As your user base grows, here's the upgrade path — each step is independent.
Phase 1: Move Docker Builds to GitHub Actions + GHCR ($0 extra)¶
When: Deploys are slowing down the server, or you want faster/safer deploys.
Why: Right now, Docker images are built on EC2 during deploy. The Next.js build uses most of the server's 2GB RAM for ~2 minutes. Moving builds to GitHub Actions means EC2 just pulls a pre-built image — deploys drop from 2-3 min to ~10 seconds with zero server impact.
How it works:
Current: Push → GitHub Actions → SSH → git pull → docker build ON EC2 → restart
Future: Push → GitHub Actions builds image → pushes to GHCR → SSH → docker pull → restart
Steps:
-
Enable GHCR (GitHub Container Registry) — it's free with your GitHub account (500MB storage on free plan, plenty for 2 images)
-
Update
.github/workflows/deploy.ymlin both repos to: - Check out the code on the GitHub runner
- Build the Docker image on the runner (7GB RAM, much faster)
- Tag it as
ghcr.io/ai-teacher-poc/ai-tutor-backend:staging - Push it to GHCR
-
SSH into EC2 and run
docker compose pull && docker compose up -d -
Update
deploy/docker-compose.ymlto use registry images instead of local builds: -
Add a GitHub Actions secret
GHCR_TOKEN(a Personal Access Token withwrite:packagesscope) — or use the built-inGITHUB_TOKENwhich has automatic GHCR access -
Log into GHCR on EC2 (one-time setup):
Result: EC2 never builds images again. Deploys are ~10 seconds. No memory pressure during deploys. Instant rollback by pulling a previous image tag.
Phase 2: Move PostgreSQL to RDS (~$15/mo extra)¶
When: 50+ users, or you want automated backups/failover.
Steps:
- Create an RDS PostgreSQL instance (
db.t4g.microordb.t4g.small) - Dump your local database:
- Import into RDS:
- Update
deploy/.env.production: - In
deploy/docker-compose.yml: - Switch the
DATABASE_URLline (commented instructions are already in the file) - Remove the
postgresservice - Remove
postgres_datavolume - Remove the
depends_on: postgresfrom backend - Redeploy:
docker compose --env-file .env.production up -d - Remove the backup cron job (RDS handles backups automatically)
Phase 3: Add SSL and a Domain¶
When: Going to production / sharing with real users.
Steps:
- Point your domain's DNS A record to the EC2 Elastic IP
- Run certbot:
sudo certbot certonly --standalone -d your-domain.com - Uncomment the SSL sections in
deploy/nginx/nginx.conf - Uncomment port 443 and letsencrypt volume in
deploy/docker-compose.yml - Restart:
docker compose --env-file .env.production restart nginx
Phase 4: Separate Frontend to Vercel/Amplify¶
When: Need CDN, edge caching, or faster global page loads.
Steps:
- Deploy
ai-tutor-uito Vercel withNEXT_PUBLIC_API_BASE_URL=https://api.your-domain.com/api/v1 - Remove the
frontendservice from docker-compose - Update Nginx to only proxy
/api/*to the backend (remove the frontend location block) - Update CORS in
.env.productionto allow Vercel domain
Phase 5: Containerize with ECS/Fargate¶
When: 100+ concurrent users, need auto-scaling.
Steps:
- Push Docker images to ECR (Elastic Container Registry)
- Create ECS task definitions using the existing Dockerfiles
- Set up an ALB (Application Load Balancer) to replace Nginx
- Configure auto-scaling policies based on CPU/memory
- The existing Dockerfiles and health checks work as-is with ECS
Phase 6: Add Caching with ElastiCache¶
When: Database queries becoming a bottleneck (unlikely at <100 users).
Add a Redis instance for: - Session caching - LLM response caching (for repeated questions) - Rate limiting
Decision Matrix¶
| Users | Recommended Setup | Monthly Cost |
|---|---|---|
| 1–25 | Single EC2 (current) | ~$18 |
| 25–50 | EC2 + RDS | ~$35 |
| 50–100 | EC2 + RDS + Vercel | ~$35 + Vercel free tier |
| 100+ | ECS Fargate + RDS + Vercel | ~$80–150 |