Skip to content

Migrate Data to AWS Staging

This guide walks through copying PostgreSQL data from your local development database to the AWS staging environment running on EC2.

Overview

The staging environment runs PostgreSQL inside a Docker container on an EC2 instance. To seed it with real data, you dump the local database, transfer the file to EC2 via SCP, copy it into the container, and restore it with psql.

This is a one-time or occasional operation, not part of the normal deployment flow.

Check for existing data first

If the AWS database already contains real user data, do not proceed without backing it up first. This restore will add rows on top of existing data, but if you truncate tables beforehand you will lose whatever is there. Run a quick count on the AWS side before starting.

Prerequisites

  • SSH access to the EC2 instance using the Terraform PEM key at ai-tutor-backend/terraform/ai-tutor-staging.pem
  • Local PostgreSQL running with data you want to migrate
  • AWS EC2 instance running with Docker Compose services up (deploy-postgres-1 container healthy)
  • pg_dump and psql installed locally
  • scp available locally

Connection Details Reference

Local AWS (EC2)
Host localhost 3.151.25.120
Port 5432 5432 (internal to Docker)
Database ai_tutor_poc ai_tutor
User admin ai_tutor

Step-by-Step Procedure

Step 1: Verify local DB has data

Connect to your local database and check row counts for key tables:

psql -U admin -d ai_tutor_poc -c "
SELECT
  schemaname,
  relname AS table_name,
  n_live_tup AS approx_rows
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC
LIMIT 10;
"

Approximate vs exact counts

pg_stat_user_tables row counts are estimates based on autovacuum statistics and can be stale. For exact verification, use SELECT count(*) FROM <table> on the tables that matter most.

For exact counts on key tables:

psql -U admin -d ai_tutor_poc -c "
SELECT
  (SELECT count(*) FROM users) AS users,
  (SELECT count(*) FROM chat_sessions) AS chat_sessions,
  (SELECT count(*) FROM chat_messages) AS chat_messages,
  (SELECT count(*) FROM prompts) AS prompts;
"

Note these numbers. You'll compare them against AWS after the restore.

Step 2: Verify AWS schema is up to date

SSH into EC2 and check that all migrations have been applied:

ssh -i ai-tutor-backend/terraform/ai-tutor-staging.pem ec2-user@3.151.25.120

Once connected, check the migration state:

cd /home/ec2-user/deploy
docker compose --env-file .env.production exec backend alembic current

The output should show (head) next to the latest revision. If it doesn't, run alembic upgrade head before proceeding.

You can also list tables directly to confirm the schema exists:

docker compose --env-file .env.production exec postgres \
  psql -U ai_tutor -d ai_tutor -c "\dt"

Exit the SSH session before continuing.

Step 3: Dump the local database

Run pg_dump with data-only mode, excluding the alembic_version table (the AWS DB manages its own migration state):

pg_dump \
  -U admin \
  -d ai_tutor_poc \
  --data-only \
  --no-owner \
  --no-privileges \
  --disable-triggers \
  --exclude-table=alembic_version \
  -f /tmp/ai_tutor_data.sql

Verify the file was created and looks reasonable:

ls -lh /tmp/ai_tutor_data.sql
head -50 /tmp/ai_tutor_data.sql

Step 4: Transfer the dump to EC2

Use scp with the Terraform PEM key:

scp -i ai-tutor-backend/terraform/ai-tutor-staging.pem \
  /tmp/ai_tutor_data.sql \
  ec2-user@3.151.25.120:/tmp/ai_tutor_data.sql

Step 5: Restore on AWS

SSH back into EC2:

ssh -i ai-tutor-backend/terraform/ai-tutor-staging.pem ec2-user@3.151.25.120

Copy the dump file into the Postgres container:

docker cp /tmp/ai_tutor_data.sql deploy-postgres-1:/tmp/ai_tutor_data.sql

Run the restore:

docker exec -i deploy-postgres-1 \
  psql -U ai_tutor -d ai_tutor -f /tmp/ai_tutor_data.sql

You'll see a stream of INSERT statements and possibly some SET commands. Warnings about transaction_timeout or lock_timeout are normal and can be ignored. Watch for any ERROR: lines, which indicate actual problems.

Step 6: Verify row counts match

Still inside the EC2 SSH session, run exact counts against the AWS database:

docker exec -i deploy-postgres-1 psql -U ai_tutor -d ai_tutor -c "
SELECT
  (SELECT count(*) FROM users) AS users,
  (SELECT count(*) FROM chat_sessions) AS chat_sessions,
  (SELECT count(*) FROM chat_messages) AS chat_messages,
  (SELECT count(*) FROM prompts) AS prompts;
"

Compare these numbers against what you recorded in Step 1. They should match.

Step 7: Clean up temp files

Remove the dump file from EC2 and from inside the container:

# On EC2
rm /tmp/ai_tutor_data.sql

# Inside the container
docker exec deploy-postgres-1 rm /tmp/ai_tutor_data.sql

Exit the SSH session:

exit

Remove the local dump file if you no longer need it:

rm /tmp/ai_tutor_data.sql

Troubleshooting

SSH: Permission denied (publickey)

The PEM key file permissions must be restricted. Fix with:

chmod 400 ai-tutor-backend/terraform/ai-tutor-staging.pem

Then retry the ssh or scp command.

WARNING: transaction_timeout / lock_timeout

These appear in the restore output when the dump file contains SET transaction_timeout = 0 or similar statements that the target Postgres version handles differently. They're warnings, not errors, and the restore continues normally.

ERROR: insert or update on table violates foreign key constraint

This happens when rows are inserted in an order that violates FK constraints. The --disable-triggers flag in the pg_dump command handles this by disabling FK checks during the restore. If you see this error, confirm the flag was included in your dump command and that you're running the restore as a superuser or the table owner.

If the error persists, you can wrap the restore in a transaction that defers constraints:

docker exec -i deploy-postgres-1 psql -U ai_tutor -d ai_tutor -c "
BEGIN;
SET CONSTRAINTS ALL DEFERRED;
\i /tmp/ai_tutor_data.sql
COMMIT;
"

Row counts don't match after restore

Check the psql output for ERROR: lines that may have caused some inserts to fail. Common causes are duplicate primary keys (data already existed in the target) or FK violations. If the AWS DB had existing rows before the restore, you may have duplicates rather than missing data.