AWS Deployment Guide¶
Single EC2 instance deployment running Next.js, FastAPI, PostgreSQL, and Nginx. Docker images are built on GitHub Actions and pushed to GHCR — EC2 only pulls pre-built images.
Target: t3.small (2 vCPU, 2GB RAM) — suitable for up to ~25 concurrent users.
Architecture¶
┌──────────────────────────────────────┐
│ EC2 t3.small (Docker Compose) │
│ │
│ ┌──────────┐ ┌──────────────────┐ │
│ │ Next.js │ │ FastAPI │ │
│ │ :3000 │ │ :9898 │ │
│ └────┬─────┘ └───────┬─────────┘ │
│ │ │ │
│ ┌────┴────────────────┴──────────┐ │
│ │ Nginx (:80 / :443) │ │
│ └────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────┐ │
│ │ PostgreSQL :5432 │ │
│ │ (containerized, local-only) │ │
│ └────────────────────────────────┘ │
└──────────────────────────────────────┘
Prerequisites¶
- AWS account with EC2 and S3 access
- A domain name (optional, but needed for SSL)
- SSH key pair for EC2
EC2 Instance Setup¶
1. Launch Instance¶
- AMI: Amazon Linux 2023 or Ubuntu 22.04
- Type:
t3.small - Storage: 30GB gp3
- Security Group:
- SSH (22) — your IP only
- HTTP (80) — 0.0.0.0/0
- HTTPS (443) — 0.0.0.0/0
2. Install Docker¶
# Amazon Linux 2023
sudo dnf update -y
sudo dnf install -y docker git
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker ec2-user
# Install Docker Compose plugin
sudo mkdir -p /usr/local/lib/docker/cli-plugins
sudo curl -SL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64 \
-o /usr/local/lib/docker/cli-plugins/docker-compose
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose
# Install Docker Buildx (required for multi-stage builds)
sudo curl -SL https://github.com/docker/buildx/releases/latest/download/buildx-v0.20.1.linux-amd64 \
-o /usr/local/lib/docker/cli-plugins/docker-buildx
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-buildx
# Log out and back in for group changes
exit
3. Clone and Configure¶
git clone <backend-repo-url> ~/ai-tutor-backend
git clone <frontend-repo-url> ~/ai-tutor-ui
cd ~/ai-tutor-backend/deploy
cp .env.production.example .env.production
Edit .env.production with your actual values:
Required changes:
- POSTGRES_PASSWORD — strong random password
- SECRET_KEY — openssl rand -hex 32
- BACKEND_CORS_ORIGINS — your domain
- ANTHROPIC_API_KEY or Bedrock config
- BACKUP_S3_BUCKET — your S3 bucket name
4. Deploy¶
Important: The env file is named .env.production (not .env), so --env-file .env.production is required on every docker compose command.
cd ~/ai-tutor-backend/deploy
# First deploy: login to GHCR, pull images, and start
# (subsequent deploys are handled automatically by GitHub Actions)
echo "YOUR_GITHUB_PAT" | docker login ghcr.io -u YOUR_GITHUB_USERNAME --password-stdin
docker compose --env-file .env.production pull
docker compose --env-file .env.production up -d
# Or build locally if GHCR images aren't available yet:
# docker compose --env-file .env.production up -d --build
Verify everything is running:
5. Run Database Migrations¶
6. Set Up Backups (Optional — requires S3 bucket)¶
Skip this step if you don't have an S3 bucket configured yet. The app runs fine without backups — you can set this up later.
# Install AWS CLI
sudo dnf install -y aws-cli
# Configure credentials
aws configure
# Test backup
cd ~/ai-tutor-backend/deploy
source .env.production
chmod +x scripts/backup-postgres.sh
./scripts/backup-postgres.sh
# Schedule daily backup at 3 AM
(crontab -l 2>/dev/null; echo "0 3 * * * cd ~/ai-tutor-backend/deploy && source .env.production && ./scripts/backup-postgres.sh >> /var/log/pg-backup.log 2>&1") | crontab -
7. SSL with Let's Encrypt (optional but recommended)¶
sudo dnf install -y certbot
sudo certbot certonly --standalone -d your-domain.com
# Then uncomment the SSL sections in:
# - deploy/nginx/nginx.conf (the server blocks at the bottom)
# - deploy/docker-compose.yml (port 443 and letsencrypt volume)
docker compose --env-file .env.production restart nginx
Common Operations¶
View logs¶
docker compose --env-file .env.production logs -f backend
docker compose --env-file .env.production logs -f frontend
docker compose --env-file .env.production logs -f postgres
Restart a service¶
Redeploy after code changes¶
Automatic (default): Merge or push to staging branch — GitHub Actions builds a Docker image on its runner (7GB RAM), pushes it to GHCR, then SSHs into EC2 to pull and restart. EC2 never builds images.
Manual fallback (if Actions is unavailable):
# Pull pre-built images from GHCR
cd ~/ai-tutor-backend && git checkout staging && git pull
cd ~/ai-tutor-backend/deploy
docker compose --env-file .env.production pull
docker compose --env-file .env.production up -d
docker compose --env-file .env.production exec -T backend alembic upgrade head
# Or build locally on EC2 (slow, avoid if possible)
docker compose --env-file .env.production up -d --build
Rollback to a previous version¶
Each deploy tags the image with the git commit SHA. To rollback:
cd ~/ai-tutor-backend/deploy
# Find available tags at: https://github.com/orgs/AI-Teacher-POC/packages
# Then edit docker-compose.yml to pin the image tag, e.g.:
# image: ghcr.io/ai-teacher-poc/ai-tutor-backend:staging-abc1234
# Or pull a specific tag directly:
docker pull ghcr.io/ai-teacher-poc/ai-tutor-backend:staging-<commit-sha>
docker tag ghcr.io/ai-teacher-poc/ai-tutor-backend:staging-<commit-sha> ghcr.io/ai-teacher-poc/ai-tutor-backend:staging
docker compose --env-file .env.production up -d backend
Manual database backup¶
Restore from backup¶
aws s3 cp s3://your-bucket/backups/postgres/ai_tutor_20260217.sql.gz /tmp/
gunzip /tmp/ai_tutor_20260217.sql.gz
docker compose --env-file .env.production exec -T postgres psql -U ai_tutor -d ai_tutor < /tmp/ai_tutor_20260217.sql
Monitoring & Observability¶
The application uses a multi-layer observability stack. For the complete guide, see Observability Guide.
Quick reference:
| Layer | Tool | What It Catches |
|---|---|---|
| Error tracking | Sentry | Application crashes, exceptions, slow queries |
| Uptime monitoring | UptimeRobot | Site completely unreachable |
| Container logs | CloudWatch Logs | All stdout/stderr from backend, frontend, nginx |
| Infrastructure | CloudWatch Alarms | CPU spikes, EC2 status check failures |
Quick checks:
# Health check (API lives on learn subdomain)
curl https://learn.2sigma.io/health
# Prometheus metrics (raw)
curl https://learn.2sigma.io/metrics
# View container logs on EC2
docker compose --env-file .env.production logs -f backend
# View CloudWatch logs (from dev machine)
aws logs tail /ai-tutor/backend --follow --profile 2sigma
Dashboards: - Sentry: https://sentry.io (org: 2sigma) - UptimeRobot: https://uptimerobot.com - CloudWatch: AWS Console → CloudWatch → Log Groups / Alarms
Cost Estimate¶
| Item | Monthly |
|---|---|
EC2 t3.small on-demand |
~$15 |
| 30GB EBS gp3 | ~$2.40 |
| S3 backups | ~$0.50 |
| Total | ~$18/mo |
With 1-year Reserved Instance: ~$11/mo
Future Expansion Guide¶
As your user base grows, here's the upgrade path — each step is independent.
Phase 1: Move Docker Builds to GitHub Actions + GHCR ($0 extra) — DONE¶
Status: Implemented. Docker images are built on GitHub Actions runners and pushed to GHCR. EC2 only pulls pre-built images.
How it works:
Push to staging → GitHub Actions builds image (7GB RAM runner) → pushes to GHCR → SSH → docker pull → restart
Components:
.github/workflows/deploy.ymlin both repos — builds, pushes to GHCR, deploys via SSHdeploy/docker-compose.yml—image:directives point toghcr.io/ai-teacher-poc/- GHCR authentication:
GITHUB_TOKEN(automatic) pushes images; same token is passed via SSH for EC2 pulls - Image tags:
:staging(latest) and:staging-{commit-sha}(for rollback) - Docker layer caching via GitHub Actions cache (
type=gha) speeds up subsequent builds
Result: EC2 never builds images. Deploys take ~10 seconds on EC2. No memory pressure during deploys. Instant rollback by pulling a previous image tag.
One-time setup for Sentry source maps (optional):
Add NEXT_PUBLIC_SENTRY_DSN as a GitHub Actions variable (Settings → Variables → Actions) in the ai-tutor-ui repo so the Sentry DSN is embedded during the frontend build. This is a public identifier (not a secret).
Adding new NEXT_PUBLIC_* environment variables:
Next.js inlines NEXT_PUBLIC_* values at build time — they cannot be set at container runtime. When adding a new one, all three of these steps are required:
- Dockerfile (
ai-tutor-ui/Dockerfile) — addARGand include in theENVblock - GitHub Actions workflow (
ai-tutor-ui/.github/workflows/deploy.yml) — add to thebuild-argslist, referencing${{ vars.YOUR_VAR_NAME }} - GitHub repo variable — set the value in
ai-tutor-uirepo Settings → Variables → Actions
Current NEXT_PUBLIC_* variables configured as build args:
| Variable | Purpose |
|---|---|
NEXT_PUBLIC_API_BASE_URL |
API base path (hardcoded /api/v1) |
NEXT_PUBLIC_SENTRY_DSN |
Sentry error tracking DSN |
NEXT_PUBLIC_SUPPORT_WHATSAPP |
WhatsApp number for support widget |
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY |
Stripe publishable key for client-side payments |
Non-NEXT_PUBLIC runtime variables (set in docker-compose.yml environment):
| Variable | Purpose |
|---|---|
INTERNAL_API_URL |
Server-side API URL bypassing nginx (http://backend:9898/api/v1) |
Phase 2: Move PostgreSQL to RDS (~$15/mo extra)¶
When: 50+ users, or you want automated backups/failover.
Steps:
- Create an RDS PostgreSQL instance (
db.t4g.microordb.t4g.small) - Dump your local database:
- Import into RDS:
- Update
deploy/.env.production: - In
deploy/docker-compose.yml: - Switch the
DATABASE_URLline (commented instructions are already in the file) - Remove the
postgresservice - Remove
postgres_datavolume - Remove the
depends_on: postgresfrom backend - Redeploy:
docker compose --env-file .env.production up -d - Remove the backup cron job (RDS handles backups automatically)
Phase 3: Add SSL and a Domain¶
When: Going to production / sharing with real users.
Steps:
- Point your domain's DNS A record to the EC2 Elastic IP
- Run certbot:
sudo certbot certonly --standalone -d your-domain.com - Uncomment the SSL sections in
deploy/nginx/nginx.conf - Uncomment port 443 and letsencrypt volume in
deploy/docker-compose.yml - Restart:
docker compose --env-file .env.production restart nginx
Phase 4: Separate Frontend to Vercel/Amplify¶
When: Need CDN, edge caching, or faster global page loads.
Steps:
- Deploy
ai-tutor-uito Vercel withNEXT_PUBLIC_API_BASE_URL=https://api.your-domain.com/api/v1 - Remove the
frontendservice from docker-compose - Update Nginx to only proxy
/api/*to the backend (remove the frontend location block) - Update CORS in
.env.productionto allow Vercel domain
Phase 5: Containerize with ECS/Fargate¶
When: 100+ concurrent users, need auto-scaling.
Steps:
- Push Docker images to ECR (Elastic Container Registry)
- Create ECS task definitions using the existing Dockerfiles
- Set up an ALB (Application Load Balancer) to replace Nginx
- Configure auto-scaling policies based on CPU/memory
- The existing Dockerfiles and health checks work as-is with ECS
Phase 6: Add Caching with ElastiCache¶
When: Database queries becoming a bottleneck (unlikely at <100 users).
Add a Redis instance for: - Session caching - LLM response caching (for repeated questions) - Rate limiting
Decision Matrix¶
| Users | Recommended Setup | Monthly Cost |
|---|---|---|
| 1–25 | Single EC2 (current) | ~$18 |
| 25–50 | EC2 + RDS | ~$35 |
| 50–100 | EC2 + RDS + Vercel | ~$35 + Vercel free tier |
| 100+ | ECS Fargate + RDS + Vercel | ~$80–150 |