Skip to content

Database Schema Reference

Complete reference for all database models and relationships.

Overview

The 2Sigma Backend uses PostgreSQL with SQLAlchemy 2.0 (async). The schema supports multi-tenant learning management with AI tutoring.

Core Entity Hierarchy

Country
└── University (tenant)
    └── Department
        └── Course (definition)
            ├── CourseOffering (term instance)
            │   └── CourseEnrollment (user enrollment)
            └── Module (hierarchical tree)
                └── ContentItem (learning materials)

Models

User Management

User

Core user accounts.

Table: users

Column Type Constraints Description
id Integer PK User ID
email String(255) Unique, Not Null User email (login)
password_hash String(255) Not Null Bcrypt hashed password
full_name String(255) Not Null Display name
locale String(10) Default 'en' User language
time_zone String(50) Default 'UTC' User timezone
is_active Boolean Default True Account status
current_streak_days Integer Default 0 Learning streak
last_activity_date Date Nullable Last activity
created_at DateTime(TZ) Default now() Creation time
updated_at DateTime(TZ) Default now() Last update

Relationships: - profile (One-to-One) → UserProfile - university_roles (One-to-Many) → UserUniversityRole - enrollments (One-to-Many) → CourseEnrollment - chat_sessions (One-to-Many) → ChatSession

Indexes: - ix_users_email (Unique)


UserProfile

Extended user profile from onboarding.

Table: user_profiles

Column Type Description
id Integer PK
user_id Integer FK → users.id (Unique)
date_of_birth Date Nullable
learning_goal Enum CAREER_ADVANCEMENT, SKILL_DEVELOPMENT, etc.
study_level Enum HIGH_SCHOOL, UNDERGRADUATE, etc.
grade Enum FRESHMAN, SOPHOMORE, etc.

Relationships: - user → User - subject_preferences → Subject (Many-to-Many via UserSubjectPreference)


UserUniversityRole

Multi-tenant RBAC (role per university).

Table: user_university_roles

Column Type Description
id Integer PK
user_id Integer FK → users.id
university_id Integer FK → universities.id
role Enum STUDENT, INSTRUCTOR, TA, ADMIN, etc.
department_id Integer FK → departments.id (Optional)

Unique: (user_id, university_id, role)


Geography/Tenancy

Country

Table: countries

Column Type Description
id Integer PK
code String(2) ISO 3166-1 alpha-2 (Unique)
name String(100) Country name

University

Tenant organizations.

Table: universities

Column Type Description
id Integer PK
name String(255) University name
short_code String(20) Unique code (e.g., "MIT")
country_id Integer FK → countries.id
time_zone String(50) Default timezone
website_url String(255) Nullable
is_active Boolean Default True

Unique: (country_id, short_code)


Department

Table: departments

Column Type Description
id Integer PK
university_id Integer FK → universities.id
name String(255) Department name
code String(20) Nullable

Learning Taxonomy

Domain

High-level subject areas.

Table: domains

Column Type Description
id Integer PK
name String(100) e.g., "Mathematics", "Science"

Subject

Recursive subject hierarchy.

Table: subjects

Column Type Description
id Integer PK
domain_id Integer FK → domains.id
parent_subject_id Integer FK → subjects.id (Self-referencing)
name String(100) Subject name

Relationships: - domain → Domain - parent_subject → Subject (Self) - child_subjects → List[Subject]


Course Structure

Course

Course definitions (reusable across terms).

Table: courses

Column Type Description
id Integer PK
university_id Integer FK → universities.id
department_id Integer FK → departments.id (Optional)
primary_subject_id Integer FK → subjects.id
title String(255) Course title
code String(20) Course code (e.g., "CS101")
description Text Nullable
level Enum INTRODUCTORY, INTERMEDIATE, ADVANCED, EXPERT
credit_hours Float Default 0
is_published Boolean Default False
is_coming_soon Boolean Default False
cache_enabled Boolean Default True — enable/disable LLM response caching per course

Unique: (university_id, code)

Relationships: - offerings (One-to-Many) → CourseOffering - modules (One-to-Many) → Module - subjects (Many-to-Many) → Subject via CourseSubject - prerequisites (Many-to-Many) → Course via CoursePrerequisite


CourseOffering

Course instances (terms/semesters).

Table: course_offerings

Column Type Description
id Integer PK
course_id Integer FK → courses.id
university_id Integer FK → universities.id
term_name String(100) e.g., "Spring 2026"
starts_at DateTime(TZ) Nullable
ends_at DateTime(TZ) Nullable
enrollment_start DateTime(TZ) Nullable
enrollment_end DateTime(TZ) Nullable
visibility Enum PUBLIC, UNLISTED, INVITE_ONLY

CourseEnrollment

User course registrations.

Table: course_enrollments

Column Type Description
id Integer PK
user_id Integer FK → users.id
course_offering_id Integer FK → course_offerings.id
status Enum ACTIVE, COMPLETED, DROPPED, PENDING, WAITLISTED
enrolled_at DateTime(TZ) Default now()
completed_at DateTime(TZ) Nullable

Unique: (user_id, course_offering_id)


Content Structure

Module

Hierarchical content organization (tree structure).

Table: modules

Column Type Description
id Integer PK
course_id Integer FK → courses.id
parent_module_id Integer FK → modules.id (Self-referencing)
title String(255) Module title
description Text Nullable
module_type Enum COURSE_ROOT, UNIT, CHAPTER, LESSON, TOPIC, LAB, QUIZ
sequence_index Integer Order within parent
depth Integer Tree depth
is_published Boolean Default False

Relationships: - parent_module → Module (Self) - child_modules → List[Module] - content_items (One-to-Many) → ContentItem


ContentType

Admin-managed content types.

Table: content_types

Column Type Description
id Integer PK
name String(50) Internal name (e.g., "video")
label String(100) Display name
description Text Nullable
is_active Boolean Default True

ContentItem

Polymorphic learning materials.

Table: content_items

Column Type Description
id Integer PK
module_id Integer FK → modules.id
course_id Integer FK → courses.id
content_type_id Integer FK → content_types.id
title String(255) Content title
description Text Nullable
data_json JSONB Flexible content data
learning_objectives JSONB Array of objectives
sequence_index Integer Order within module
estimated_duration_seconds Integer Nullable
prompt_name String(255) Name of prompt to use (Optional)
prompt_label String(36) Label of prompt version to resolve (Default "production")

Indexes: - GIN index on data_json


Progress Tracking

UserModuleProgress

Table: user_module_progress

Column Type Description
id Integer PK
user_id Integer FK → users.id
course_offering_id Integer FK → course_offerings.id
module_id Integer FK → modules.id
progress_percent Integer 0-100
started_at DateTime(TZ) Default now()
completed_at DateTime(TZ) Nullable

Unique: (user_id, course_offering_id, module_id)


UserContentProgress

Table: user_content_progress

Column Type Description
id Integer PK
user_id Integer FK → users.id
course_offering_id Integer FK → course_offerings.id
content_item_id Integer FK → content_items.id
progress_percent Integer 0-100
attempts_count Integer Default 0
first_started_at DateTime(TZ) Default now()
last_interacted_at DateTime(TZ) Default now()
completed_at DateTime(TZ) Nullable

Unique: (user_id, course_offering_id, content_item_id)


AI Chat System

ChatSession

Table: chat_sessions

Column Type Constraints Description
id Integer PK Session ID
user_id Integer FK → users.id Session owner
content_item_id Integer FK → content_items.id (Optional) Associated content
content_context JSONB Nullable Snapshot of content
share_enabled Boolean Default False Sharing toggle
share_token String(255) Unique, Nullable Public share token
cache_node_id BigInteger FK → llm_response_cache.id (SET NULL) Current position in LLM response cache tree (NULL = at root/start)
cache_state String(20) Not Null, Default 'TRACKING' Cache tracking state: TRACKING (following cached path) or DIVERGED (free-form message sent, cache disabled)
created_at DateTime(TZ) Default now() Creation time
updated_at DateTime(TZ) Default now() Last update

Relationships: - messages (One-to-Many) → ChatMessage - notes (One-to-Many) → ChatNote - cache_node (Many-to-One) → LLMResponseCache


ChatMessage

Table: chat_messages

Column Type Description
id Integer PK
session_id Integer FK → chat_sessions.id
role Enum USER, ASSISTANT, SYSTEM
content Text Message text
meta JSONB Model, prompt ID, etc.
created_at DateTime(TZ) Default now()

ChatNote

Table: chat_notes

Column Type Description
id Integer PK
session_id Integer FK → chat_sessions.id
message_id Integer FK → chat_messages.id (Optional)
content Text Note content
selected_text Text Highlighted text
created_at DateTime(TZ) Default now()

Prompt Management

Prompt

Langfuse-style versioned LLM prompts with label-based version management.

Table: prompts

Column Type Description
id BigInteger PK
name String(255) Prompt name (e.g., "General Protocol")
version Integer Auto-incrementing version per name
type String(20) Prompt type (default "text")
prompt Text Prompt template content (nullable)
config JSON Version-specific configuration (default {})
labels ARRAY(Text) Version labels (e.g., ["production", "latest"])
tags ARRAY(Text) Tags synchronized across all versions of a name
commit_message String(500) Version commit message (nullable)
created_by String(255) User ID or "API"
created_at DateTime(TZ) Default now()
updated_at DateTime(TZ) Default now()

Constraints: - Unique: (name, version) — no duplicate versions per prompt name - Labels are unique per prompt name (a label can only exist on one version at a time) - latest label is auto-assigned to the newest version - production is the default label for resolution

Indexes: - ix_prompts_name on (name) - ix_prompts_labels on (labels) using GIN - ix_prompts_tags on (tags) using GIN - ix_prompts_created_at on (created_at)


PromptDependency

Tracks inline prompt composition references (@@@langfusePrompt:name=X|label=Y@@@).

Table: prompt_dependencies

Column Type Description
id BigInteger PK
parent_id BigInteger FK → prompts.id (CASCADE)
child_name String(255) Referenced prompt name
child_version Integer Referenced version (nullable)
child_label String(36) Referenced label (nullable)
created_at DateTime(TZ) Default now()
updated_at DateTime(TZ) Default now()

Constraints: - Check: exactly one of child_version or child_label must be non-null - Max nesting depth: 5 levels - Circular dependencies are detected and rejected

Indexes: - ix_prompt_deps_parent on (parent_id) - ix_prompt_deps_child on (child_name)


LLM Response Cache

LLMResponseCache

Tree-structured cache for LLM responses. Eliminates redundant LLM calls when users follow identical conversation paths on the same topic. Each node represents one turn in a conversation — the root is the greeting, and children branch at each MCQ answer choice.

Table: llm_response_cache

Column Type Constraints Description
id BigInteger PK Node ID
content_item_id BigInteger FK → content_items.id (CASCADE), Not Null Topic this cache tree belongs to
parent_id BigInteger FK → llm_response_cache.id (CASCADE), Nullable Parent node (NULL for root/greeting)
prompt_id BigInteger Not Null Prompt ID used for generation
prompt_version Integer Not Null Prompt version (cache key component)
llm_provider String(50) Not Null LLM provider name (e.g., "anthropic", "aws_bedrock")
llm_model_id String(255) Not Null LLM model identifier (cache key component)
user_message_hash String(64) Not Null SHA-256 hash of normalized user message
user_message Text Not Null Raw user message (for debugging)
response_content Text Not Null Full LLM response text
response_meta JSONB Default '{}' Prompt metadata (same shape as chat_messages.meta)
response_chunks JSONB Default '[]' Original SSE chunk boundaries for replay
hit_count Integer Not Null, Default 0 Number of times this cached response was served
created_at DateTime(TZ) Default now() Creation time
updated_at DateTime(TZ) Default now() Last update

Relationships: - parent (Many-to-One, Self) → LLMResponseCache - children (One-to-Many, Self) → LLMResponseCache - content_item (Many-to-One) → ContentItem

Partial Unique Indexes (PostgreSQL WHERE clause indexes — required because NULL ≠ NULL in standard unique constraints): - uq_cache_child_node(parent_id, user_message_hash, prompt_version, llm_model_id, llm_provider) WHERE parent_id IS NOT NULL - uq_cache_root_node(content_item_id, user_message_hash, prompt_version, llm_model_id, llm_provider) WHERE parent_id IS NULL

Indexes: - ix_llm_cache_content_item on (content_item_id) - ix_llm_cache_parent on (parent_id) - ix_llm_cache_lookup on (parent_id, user_message_hash) — fast child node lookup - ix_llm_cache_root on (content_item_id, prompt_version, llm_model_id, llm_provider) WHERE parent_id IS NULL — fast greeting/root lookup

Cache Key Composition: - Root nodes keyed by: (content_item_id, user_message_hash, prompt_version, llm_model_id, llm_provider) - Child nodes keyed by: (parent_id, user_message_hash, prompt_version, llm_model_id, llm_provider) - When prompt version or model changes, new requests naturally miss the cache and a new tree grows


Feedback

FeatureRequest

User-submitted feature requests with admin status tracking.

Table: feature_requests

Column Type Description
id Integer PK
user_id Integer FK → users.id
title String(255) Feature request title
description Text Detailed description
status Enum PENDING, UNDER_REVIEW, PLANNED, COMPLETED, DECLINED
admin_notes Text Internal notes (nullable)
created_at DateTime Server default now()
updated_at DateTime Server default now(), auto-updates

Relationships: - user (Many-to-One) → User


System Configuration


SystemSetting

Key-value configuration store.

Table: system_settings

Column Type Description
id Integer PK
setting_key String(100) Unique key
setting_value Text Value (encrypted for secrets)
is_encrypted Boolean Default False
description Text Nullable
updated_at DateTime(TZ) Default now()

Database Enums

UserRole

STUDENT, INSTRUCTOR, TA, DEPARTMENT_ADMIN, UNIVERSITY_ADMIN, PLATFORM_ADMIN

LearningGoal

CAREER_ADVANCEMENT, SKILL_DEVELOPMENT, ACADEMIC_REQUIREMENT, PERSONAL_INTEREST, CERTIFICATION

StudyLevel

HIGH_SCHOOL, UNDERGRADUATE, GRADUATE, DOCTORATE, PROFESSIONAL

CourseLevel

INTRODUCTORY, INTERMEDIATE, ADVANCED, EXPERT

EnrollmentStatus

ACTIVE, COMPLETED, DROPPED, PENDING, WAITLISTED

ModuleType

COURSE_ROOT, UNIT, CHAPTER, LESSON, TOPIC, SECTION, LAB, QUIZ

MessageRole

USER, ASSISTANT, SYSTEM

FeatureRequestStatus

PENDING, UNDER_REVIEW, PLANNED, COMPLETED, DECLINED

Indexes and Performance

Key Indexes: - All foreign keys have indexes - Unique constraints on: user email, university short_code, course code - Composite indexes: (user_id, course_offering_id), (module_id, sequence_index) - GIN indexes on JSONB columns: content_items.data_json, chat_sessions.content_context - GIN indexes on ARRAY columns: prompts.labels, prompts.tags - Partial unique indexes on llm_response_cache (separate indexes for root vs child nodes using WHERE clauses) - Composite lookup index on llm_response_cache (parent_id, user_message_hash) for O(1) cache lookups

Cascade Deletes: - Most relationships use CASCADE delete - Example: Deleting a course deletes its offerings, enrollments, and modules

Migration History

26 migrations from 2025-11-27 to 2026-03-02. See Database Migrations Guide.

Next Steps