Migrate Data to AWS Staging¶
This guide walks through copying PostgreSQL data from your local development database to the AWS staging environment running on EC2.
Overview¶
The staging environment runs PostgreSQL inside a Docker container on an EC2 instance. To seed it with real data, you dump the local database, transfer the file to EC2 via SCP, copy it into the container, and restore it with psql.
This is a one-time or occasional operation, not part of the normal deployment flow.
Check for existing data first
If the AWS database already contains real user data, do not proceed without backing it up first. This restore will add rows on top of existing data, but if you truncate tables beforehand you will lose whatever is there. Run a quick count on the AWS side before starting.
Prerequisites¶
- SSH access to the EC2 instance using the Terraform PEM key at
ai-tutor-backend/terraform/ai-tutor-staging.pem - Local PostgreSQL running with data you want to migrate
- AWS EC2 instance running with Docker Compose services up (
deploy-postgres-1container healthy) pg_dumpandpsqlinstalled locallyscpavailable locally
Connection Details Reference¶
| Local | AWS (EC2) | |
|---|---|---|
| Host | localhost |
3.151.25.120 |
| Port | 5432 |
5432 (internal to Docker) |
| Database | ai_tutor_poc |
ai_tutor |
| User | admin |
ai_tutor |
Step-by-Step Procedure¶
Step 1: Verify local DB has data¶
Connect to your local database and check row counts for key tables:
psql -U admin -d ai_tutor_poc -c "
SELECT
schemaname,
relname AS table_name,
n_live_tup AS approx_rows
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC
LIMIT 10;
"
Approximate vs exact counts
pg_stat_user_tables row counts are estimates based on autovacuum statistics and can be stale. For exact verification, use SELECT count(*) FROM <table> on the tables that matter most.
For exact counts on key tables:
psql -U admin -d ai_tutor_poc -c "
SELECT
(SELECT count(*) FROM users) AS users,
(SELECT count(*) FROM chat_sessions) AS chat_sessions,
(SELECT count(*) FROM chat_messages) AS chat_messages,
(SELECT count(*) FROM prompts) AS prompts;
"
Note these numbers. You'll compare them against AWS after the restore.
Step 2: Verify AWS schema is up to date¶
SSH into EC2 and check that all migrations have been applied:
Once connected, check the migration state:
The output should show (head) next to the latest revision. If it doesn't, run alembic upgrade head before proceeding.
You can also list tables directly to confirm the schema exists:
Exit the SSH session before continuing.
Step 3: Dump the local database¶
Run pg_dump with data-only mode, excluding the alembic_version table (the AWS DB manages its own migration state):
pg_dump \
-U admin \
-d ai_tutor_poc \
--data-only \
--no-owner \
--no-privileges \
--disable-triggers \
--exclude-table=alembic_version \
-f /tmp/ai_tutor_data.sql
Verify the file was created and looks reasonable:
Step 4: Transfer the dump to EC2¶
Use scp with the Terraform PEM key:
scp -i ai-tutor-backend/terraform/ai-tutor-staging.pem \
/tmp/ai_tutor_data.sql \
ec2-user@3.151.25.120:/tmp/ai_tutor_data.sql
Step 5: Restore on AWS¶
SSH back into EC2:
Copy the dump file into the Postgres container:
Run the restore:
You'll see a stream of INSERT statements and possibly some SET commands. Warnings about transaction_timeout or lock_timeout are normal and can be ignored. Watch for any ERROR: lines, which indicate actual problems.
Step 6: Verify row counts match¶
Still inside the EC2 SSH session, run exact counts against the AWS database:
docker exec -i deploy-postgres-1 psql -U ai_tutor -d ai_tutor -c "
SELECT
(SELECT count(*) FROM users) AS users,
(SELECT count(*) FROM chat_sessions) AS chat_sessions,
(SELECT count(*) FROM chat_messages) AS chat_messages,
(SELECT count(*) FROM prompts) AS prompts;
"
Compare these numbers against what you recorded in Step 1. They should match.
Step 7: Clean up temp files¶
Remove the dump file from EC2 and from inside the container:
# On EC2
rm /tmp/ai_tutor_data.sql
# Inside the container
docker exec deploy-postgres-1 rm /tmp/ai_tutor_data.sql
Exit the SSH session:
Remove the local dump file if you no longer need it:
Troubleshooting¶
SSH: Permission denied (publickey)¶
The PEM key file permissions must be restricted. Fix with:
Then retry the ssh or scp command.
WARNING: transaction_timeout / lock_timeout¶
These appear in the restore output when the dump file contains SET transaction_timeout = 0 or similar statements that the target Postgres version handles differently. They're warnings, not errors, and the restore continues normally.
ERROR: insert or update on table violates foreign key constraint¶
This happens when rows are inserted in an order that violates FK constraints. The --disable-triggers flag in the pg_dump command handles this by disabling FK checks during the restore. If you see this error, confirm the flag was included in your dump command and that you're running the restore as a superuser or the table owner.
If the error persists, you can wrap the restore in a transaction that defers constraints:
docker exec -i deploy-postgres-1 psql -U ai_tutor -d ai_tutor -c "
BEGIN;
SET CONSTRAINTS ALL DEFERRED;
\i /tmp/ai_tutor_data.sql
COMMIT;
"
Row counts don't match after restore¶
Check the psql output for ERROR: lines that may have caused some inserts to fail. Common causes are duplicate primary keys (data already existed in the target) or FK violations. If the AWS DB had existing rows before the restore, you may have duplicates rather than missing data.