Skip to content

AWS Deployment Guide

Single EC2 instance deployment running Next.js, FastAPI, PostgreSQL, and Nginx. Docker images are built on GitHub Actions and pushed to GHCR — EC2 only pulls pre-built images.

Target: t3.small (2 vCPU, 2GB RAM) — suitable for up to ~25 concurrent users.

Architecture

┌──────────────────────────────────────┐
│  EC2 t3.small (Docker Compose)       │
│                                      │
│  ┌──────────┐  ┌──────────────────┐  │
│  │ Next.js  │  │  FastAPI         │  │
│  │ :3000    │  │  :9898           │  │
│  └────┬─────┘  └───────┬─────────┘  │
│       │                │             │
│  ┌────┴────────────────┴──────────┐  │
│  │  Nginx (:80 / :443)           │  │
│  └────────────────────────────────┘  │
│                                      │
│  ┌────────────────────────────────┐  │
│  │  PostgreSQL :5432              │  │
│  │  (containerized, local-only)   │  │
│  └────────────────────────────────┘  │
└──────────────────────────────────────┘

Prerequisites

  • AWS account with EC2 and S3 access
  • A domain name (optional, but needed for SSL)
  • SSH key pair for EC2

EC2 Instance Setup

1. Launch Instance

  • AMI: Amazon Linux 2023 or Ubuntu 22.04
  • Type: t3.small
  • Storage: 30GB gp3
  • Security Group:
  • SSH (22) — your IP only
  • HTTP (80) — 0.0.0.0/0
  • HTTPS (443) — 0.0.0.0/0

2. Install Docker

# Amazon Linux 2023
sudo dnf update -y
sudo dnf install -y docker git
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker ec2-user

# Install Docker Compose plugin
sudo mkdir -p /usr/local/lib/docker/cli-plugins
sudo curl -SL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64 \
  -o /usr/local/lib/docker/cli-plugins/docker-compose
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose

# Install Docker Buildx (required for multi-stage builds)
sudo curl -SL https://github.com/docker/buildx/releases/latest/download/buildx-v0.20.1.linux-amd64 \
  -o /usr/local/lib/docker/cli-plugins/docker-buildx
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-buildx

# Log out and back in for group changes
exit

3. Clone and Configure

git clone <backend-repo-url> ~/ai-tutor-backend
git clone <frontend-repo-url> ~/ai-tutor-ui
cd ~/ai-tutor-backend/deploy

cp .env.production.example .env.production

Edit .env.production with your actual values:

nano .env.production

Required changes: - POSTGRES_PASSWORD — strong random password - SECRET_KEYopenssl rand -hex 32 - BACKEND_CORS_ORIGINS — your domain - ANTHROPIC_API_KEY or Bedrock config - BACKUP_S3_BUCKET — your S3 bucket name

4. Deploy

Important: The env file is named .env.production (not .env), so --env-file .env.production is required on every docker compose command.

cd ~/ai-tutor-backend/deploy

# First deploy: login to GHCR, pull images, and start
# (subsequent deploys are handled automatically by GitHub Actions)
echo "YOUR_GITHUB_PAT" | docker login ghcr.io -u YOUR_GITHUB_USERNAME --password-stdin
docker compose --env-file .env.production pull
docker compose --env-file .env.production up -d

# Or build locally if GHCR images aren't available yet:
# docker compose --env-file .env.production up -d --build

Verify everything is running:

docker compose --env-file .env.production ps
curl http://localhost/health
curl http://localhost/

5. Run Database Migrations

docker compose --env-file .env.production exec backend alembic upgrade head

6. Set Up Backups (Optional — requires S3 bucket)

Skip this step if you don't have an S3 bucket configured yet. The app runs fine without backups — you can set this up later.

# Install AWS CLI
sudo dnf install -y aws-cli

# Configure credentials
aws configure

# Test backup
cd ~/ai-tutor-backend/deploy
source .env.production
chmod +x scripts/backup-postgres.sh
./scripts/backup-postgres.sh

# Schedule daily backup at 3 AM
(crontab -l 2>/dev/null; echo "0 3 * * * cd ~/ai-tutor-backend/deploy && source .env.production && ./scripts/backup-postgres.sh >> /var/log/pg-backup.log 2>&1") | crontab -
sudo dnf install -y certbot
sudo certbot certonly --standalone -d your-domain.com

# Then uncomment the SSL sections in:
# - deploy/nginx/nginx.conf (the server blocks at the bottom)
# - deploy/docker-compose.yml (port 443 and letsencrypt volume)

docker compose --env-file .env.production restart nginx

Common Operations

View logs

docker compose --env-file .env.production logs -f backend
docker compose --env-file .env.production logs -f frontend
docker compose --env-file .env.production logs -f postgres

Restart a service

docker compose --env-file .env.production restart backend

Redeploy after code changes

Automatic (default): Merge or push to staging branch — GitHub Actions builds a Docker image on its runner (7GB RAM), pushes it to GHCR, then SSHs into EC2 to pull and restart. EC2 never builds images.

git checkout staging && git merge main && git push && git checkout main

Manual fallback (if Actions is unavailable):

# Pull pre-built images from GHCR
cd ~/ai-tutor-backend && git checkout staging && git pull
cd ~/ai-tutor-backend/deploy
docker compose --env-file .env.production pull
docker compose --env-file .env.production up -d
docker compose --env-file .env.production exec -T backend alembic upgrade head

# Or build locally on EC2 (slow, avoid if possible)
docker compose --env-file .env.production up -d --build

Rollback to a previous version

Each deploy tags the image with the git commit SHA. To rollback:

cd ~/ai-tutor-backend/deploy

# Find available tags at: https://github.com/orgs/AI-Teacher-POC/packages
# Then edit docker-compose.yml to pin the image tag, e.g.:
#   image: ghcr.io/ai-teacher-poc/ai-tutor-backend:staging-abc1234

# Or pull a specific tag directly:
docker pull ghcr.io/ai-teacher-poc/ai-tutor-backend:staging-<commit-sha>
docker tag ghcr.io/ai-teacher-poc/ai-tutor-backend:staging-<commit-sha> ghcr.io/ai-teacher-poc/ai-tutor-backend:staging
docker compose --env-file .env.production up -d backend

Manual database backup

cd ~/ai-tutor-backend/deploy
source .env.production
./scripts/backup-postgres.sh

Restore from backup

aws s3 cp s3://your-bucket/backups/postgres/ai_tutor_20260217.sql.gz /tmp/
gunzip /tmp/ai_tutor_20260217.sql.gz
docker compose --env-file .env.production exec -T postgres psql -U ai_tutor -d ai_tutor < /tmp/ai_tutor_20260217.sql

Monitoring & Observability

The application uses a multi-layer observability stack. For the complete guide, see Observability Guide.

Quick reference:

Layer Tool What It Catches
Error tracking Sentry Application crashes, exceptions, slow queries
Uptime monitoring UptimeRobot Site completely unreachable
Container logs CloudWatch Logs All stdout/stderr from backend, frontend, nginx
Infrastructure CloudWatch Alarms CPU spikes, EC2 status check failures

Quick checks:

# Health check (API lives on learn subdomain)
curl https://learn.2sigma.io/health

# Prometheus metrics (raw)
curl https://learn.2sigma.io/metrics

# View container logs on EC2
docker compose --env-file .env.production logs -f backend

# View CloudWatch logs (from dev machine)
aws logs tail /ai-tutor/backend --follow --profile 2sigma

Dashboards: - Sentry: https://sentry.io (org: 2sigma) - UptimeRobot: https://uptimerobot.com - CloudWatch: AWS Console → CloudWatch → Log Groups / Alarms

Cost Estimate

Item Monthly
EC2 t3.small on-demand ~$15
30GB EBS gp3 ~$2.40
S3 backups ~$0.50
Total ~$18/mo

With 1-year Reserved Instance: ~$11/mo


Future Expansion Guide

As your user base grows, here's the upgrade path — each step is independent.

Phase 1: Move Docker Builds to GitHub Actions + GHCR ($0 extra) — DONE

Status: Implemented. Docker images are built on GitHub Actions runners and pushed to GHCR. EC2 only pulls pre-built images.

How it works:

Push to staging → GitHub Actions builds image (7GB RAM runner) → pushes to GHCR → SSH → docker pull → restart

Components:

  • .github/workflows/deploy.yml in both repos — builds, pushes to GHCR, deploys via SSH
  • deploy/docker-compose.ymlimage: directives point to ghcr.io/ai-teacher-poc/
  • GHCR authentication: GITHUB_TOKEN (automatic) pushes images; same token is passed via SSH for EC2 pulls
  • Image tags: :staging (latest) and :staging-{commit-sha} (for rollback)
  • Docker layer caching via GitHub Actions cache (type=gha) speeds up subsequent builds

Result: EC2 never builds images. Deploys take ~10 seconds on EC2. No memory pressure during deploys. Instant rollback by pulling a previous image tag.

One-time setup for Sentry source maps (optional):

Add NEXT_PUBLIC_SENTRY_DSN as a GitHub Actions variable (Settings → Variables → Actions) in the ai-tutor-ui repo so the Sentry DSN is embedded during the frontend build. This is a public identifier (not a secret).

Adding new NEXT_PUBLIC_* environment variables:

Next.js inlines NEXT_PUBLIC_* values at build time — they cannot be set at container runtime. When adding a new one, all three of these steps are required:

  1. Dockerfile (ai-tutor-ui/Dockerfile) — add ARG and include in the ENV block
  2. GitHub Actions workflow (ai-tutor-ui/.github/workflows/deploy.yml) — add to the build-args list, referencing ${{ vars.YOUR_VAR_NAME }}
  3. GitHub repo variable — set the value in ai-tutor-ui repo Settings → Variables → Actions

Current NEXT_PUBLIC_* variables configured as build args:

Variable Purpose
NEXT_PUBLIC_API_BASE_URL API base path (hardcoded /api/v1)
NEXT_PUBLIC_SENTRY_DSN Sentry error tracking DSN
NEXT_PUBLIC_SUPPORT_WHATSAPP WhatsApp number for support widget
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY Stripe publishable key for client-side payments

Non-NEXT_PUBLIC runtime variables (set in docker-compose.yml environment):

Variable Purpose
INTERNAL_API_URL Server-side API URL bypassing nginx (http://backend:9898/api/v1)

Phase 2: Move PostgreSQL to RDS (~$15/mo extra)

When: 50+ users, or you want automated backups/failover.

Steps:

  1. Create an RDS PostgreSQL instance (db.t4g.micro or db.t4g.small)
  2. Dump your local database:
    docker compose --env-file .env.production exec postgres pg_dump -U ai_tutor -d ai_tutor > dump.sql
    
  3. Import into RDS:
    psql -h your-rds-endpoint.amazonaws.com -U ai_tutor -d ai_tutor < dump.sql
    
  4. Update deploy/.env.production:
    # Comment out local postgres vars
    # POSTGRES_USER=...
    # POSTGRES_PASSWORD=...
    
    # Add RDS connection
    RDS_USER=ai_tutor
    RDS_PASSWORD=your_rds_password
    RDS_ENDPOINT=your-db.xxxx.us-east-1.rds.amazonaws.com
    RDS_DB=ai_tutor
    
  5. In deploy/docker-compose.yml:
  6. Switch the DATABASE_URL line (commented instructions are already in the file)
  7. Remove the postgres service
  8. Remove postgres_data volume
  9. Remove the depends_on: postgres from backend
  10. Redeploy: docker compose --env-file .env.production up -d
  11. Remove the backup cron job (RDS handles backups automatically)

Phase 3: Add SSL and a Domain

When: Going to production / sharing with real users.

Steps:

  1. Point your domain's DNS A record to the EC2 Elastic IP
  2. Run certbot: sudo certbot certonly --standalone -d your-domain.com
  3. Uncomment the SSL sections in deploy/nginx/nginx.conf
  4. Uncomment port 443 and letsencrypt volume in deploy/docker-compose.yml
  5. Restart: docker compose --env-file .env.production restart nginx

Phase 4: Separate Frontend to Vercel/Amplify

When: Need CDN, edge caching, or faster global page loads.

Steps:

  1. Deploy ai-tutor-ui to Vercel with NEXT_PUBLIC_API_BASE_URL=https://api.your-domain.com/api/v1
  2. Remove the frontend service from docker-compose
  3. Update Nginx to only proxy /api/* to the backend (remove the frontend location block)
  4. Update CORS in .env.production to allow Vercel domain

Phase 5: Containerize with ECS/Fargate

When: 100+ concurrent users, need auto-scaling.

Steps:

  1. Push Docker images to ECR (Elastic Container Registry)
  2. Create ECS task definitions using the existing Dockerfiles
  3. Set up an ALB (Application Load Balancer) to replace Nginx
  4. Configure auto-scaling policies based on CPU/memory
  5. The existing Dockerfiles and health checks work as-is with ECS

Phase 6: Add Caching with ElastiCache

When: Database queries becoming a bottleneck (unlikely at <100 users).

Add a Redis instance for: - Session caching - LLM response caching (for repeated questions) - Rate limiting

Decision Matrix

Users Recommended Setup Monthly Cost
1–25 Single EC2 (current) ~$18
25–50 EC2 + RDS ~$35
50–100 EC2 + RDS + Vercel ~$35 + Vercel free tier
100+ ECS Fargate + RDS + Vercel ~$80–150