Skip to content

Prompt Metadata Tracking

Overview

As of this update, chat messages track which prompt was used when generating AI responses. This metadata is stored in the meta JSONB column of the chat_messages table.

What Gets Tracked

When an AI assistant message is created, the following metadata is automatically saved:

{
  "prompt_id": 5,
  "prompt_version": 3,
  "prompt_name": "Video Tutor",
  "prompt_label": "production",
  "content_type": "video",
  "model": "claude-3-5-sonnet-20241022",
  "provider": "anthropic"
}

Metadata Fields

  • prompt_id: Database ID of the prompt version used
  • prompt_version: Version number of that prompt
  • prompt_name: Name of the prompt
  • prompt_label: Label that was used to resolve the prompt version (e.g., "production")
  • content_type: Content type of the content item being tutored (text, video, quiz, etc.)
  • model: The AI model used (e.g., "claude-3-5-sonnet-20241022")
  • provider: The LLM provider ("anthropic" or "aws_bedrock")

Implementation Details

Code Changes

  1. app/services/llm_service.py:
  2. _build_prompt() now returns a tuple: (prompt_text, metadata_dict)
  3. stream_response() yields tuples of (content_chunk, metadata)
  4. Metadata is only included in the first chunk to avoid duplication

  5. app/api/v1/chat.py:

  6. The send_message() endpoint captures metadata from the first chunk
  7. Passes metadata to create_chat_message() when saving the assistant's response

  8. app/crud/chat.py:

  9. create_chat_message() already had a meta parameter - no changes needed

Database Schema

No schema changes required! The chat_messages.meta JSONB column already existed and supports arbitrary JSON data.

Use Cases

Audit Trail

Query which prompt version was used for specific conversations:

SELECT id, created_at, meta->>'prompt_name', meta->>'prompt_version'
FROM chat_messages
WHERE role = 'assistant'
  AND session_id = 123;

A/B Testing

Compare responses from different prompt versions:

SELECT 
  meta->>'prompt_version' as version,
  COUNT(*) as message_count,
  AVG(LENGTH(content)) as avg_response_length
FROM chat_messages
WHERE role = 'assistant'
  AND meta->>'prompt_id' IS NOT NULL
GROUP BY meta->>'prompt_version';

Debugging

Find all messages that used a specific prompt:

SELECT session_id, content, created_at
FROM chat_messages
WHERE role = 'assistant'
  AND meta->>'prompt_id' = '5';

Important Notes

Backward Compatibility

  • ✅ Existing messages without metadata continue to work normally
  • ✅ Empty meta objects ({}) are valid
  • ⚠️ Messages created before this update will have empty meta fields

Future Improvements

Consider tracking additional metadata: - Response latency/timing - Token counts (input/output) - Temperature and other model parameters - User feedback/ratings on responses

Testing

To verify metadata is being saved:

from sqlalchemy import select
from app.models.chat import ChatMessage, MessageRole

# Get recent assistant messages
result = await db.execute(
    select(ChatMessage)
    .where(ChatMessage.role == MessageRole.ASSISTANT)
    .order_by(ChatMessage.created_at.desc())
    .limit(10)
)

for msg in result.scalars():
    print(f"Message {msg.id}: {msg.meta}")
  • app/services/llm_service.py - LLM service with metadata generation
  • app/api/v1/chat.py - Chat API endpoint that captures metadata
  • app/crud/prompt.py - Prompt CRUD operations
  • app/models/prompt.py - Prompt and PromptDependency model definitions
  • app/models/chat.py - ChatMessage model definition