Content Management

Geetanjali uses a code-first approach where content metadata is authored in Python files and synced to the database. This enables version control, code review, and repeatable deployments.

Content Types

Type	Source	Records	Sync Method
Verses	External APIs	700	`ingest_data.py`
Featured Verses	`featured_verses.py`	180	Automatic startup sync
Chapter Metadata	`chapter_metadata.py`	18 + 1 book	Automatic startup sync
Geeta Dhyanam	`geeta_dhyanam.py`	9	Automatic startup sync
Audio Metadata	`verse_audio_metadata/`	700	Automatic startup sync
Audio Durations	MP3 files (ffprobe)	700	Automatic startup sync
Audio Files	Colab TTS pipeline	709 MP3s	Manual

Scripts

Data Ingestion

Script	Purpose
`ingest_data.py`	Initial verse ingestion from external sources
`init_db.py`	Database schema initialization
`backfill_paraphrase_metadata.py`	Sync paraphrases to ChromaDB

Audio Processing

Script	Purpose
`export_tts_metadata.py`	Export verse data for Colab TTS
`export_dhyanam_metadata.py`	Export Dhyanam data for Colab
`process_tts_audio.py`	Convert Colab WAV to MP3
`process_dhyanam_audio.py`	Convert Dhyanam WAV to MP3
`qa_audio_files.py`	QA check for truncation/anomalies

All scripts are in backend/scripts/.

Note: Audio durations are extracted automatically on startup via StartupSyncService. No manual script needed.

Automatic Startup Sync

On each backend startup, StartupSyncService automatically syncs all curated content using hash-based change detection:

Metadata (book + 18 chapters)
Dhyanam Verses (9 invocation verses)
Featured Verses (180 flagged verses)
Audio Metadata (speaker, tone, pacing for TTS)
Audio Durations (extracted from MP3 files via ffprobe)

The service computes SHA256 hashes of source data and only syncs when content has changed. Force sync with FORCE_CONTENT_SYNC=true env var.

Admin API

Manual sync endpoints for testing or recovery:

# Sync featured verses
curl -X POST http://localhost:8000/api/v1/admin/sync-featured \
  -H "X-API-Key: YOUR_KEY"

# Sync chapter metadata
curl -X POST http://localhost:8000/api/v1/admin/sync-metadata \
  -H "X-API-Key: YOUR_KEY"

# Sync dhyanam verses
curl -X POST http://localhost:8000/api/v1/admin/sync-dhyanam \
  -H "X-API-Key: YOUR_KEY"

# Sync audio metadata (TTS hints)
curl -X POST http://localhost:8000/api/v1/admin/sync-audio-metadata \
  -H "X-API-Key: YOUR_KEY"

# Trigger verse enrichment (LLM paraphrases)
curl -X POST http://localhost:8000/api/v1/admin/enrich \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"limit": 50, "force": false}'

# Check status
curl http://localhost:8000/api/v1/admin/status \
  -H "X-API-Key: YOUR_KEY"

Note: These endpoints are redundant for normal operation—startup sync handles everything. Use for manual intervention or testing.

Data Files

Located in backend/data/:

File	Content
`featured_verses.py`	180 curated verse IDs
`chapter_metadata.py`	Book + 18 chapter intros
`geeta_dhyanam.py`	9 invocation verses
`verse_audio_metadata/`	TTS generation configs

Audio Metadata Hierarchy

Audio metadata resolves in order:

Explicit chapter config (chapter_02.py)
Maha vakya overrides (maha_vakyas.py)
Chapter defaults (defaults.py[chapter])
Speaker defaults (defaults.py[speaker])
Global defaults (defaults.py[GLOBAL])

Audio Generation

Full pipeline:

1. Export metadata
   docker compose exec backend python /app/scripts/export_tts_metadata.py --chapter N

2. Generate in Colab (upload JSON, run cells, download ZIP)

3. Process audio
   docker compose exec backend python /app/scripts/process_tts_audio.py chapter_N_wav.zip

4. QA check
   docker compose exec backend python /app/scripts/qa_audio_files.py --chapter N

5. Restart backend (durations extracted automatically on startup)
   docker compose restart backend

QA Thresholds

Check	Threshold
Minimum duration	3 seconds (truncation detection)
Syllable rate	1.5–5.0 per second
Maximum duration	60s (verses), 120s (dhyanam)

Database Tables

Table	Key Fields
`verses`	canonical_id, sanskrit_devanagari, translation_en, is_featured
`verse_audio_metadata`	canonical_id, audio_duration_ms, audio_file_path
`book_metadata`	book_key, intro_text, verse_count
`chapter_metadata`	chapter_number, summary, hero_verse_id

Troubleshooting

Issue	Solution
Sync returns 0 records	Run `ingest_data.py` first
Audio not playing	Check `audio_file_path` in verse_audio_metadata
TTS truncation	Text normalizer should convert `।` to `,`
Duration missing	Restart backend (auto-extracts on startup) or check MP3 files exist

Verification

# Check verse count
curl http://localhost:8000/api/v1/admin/status

# Check audio file exists
ls -la public/audio/mp3/02/BG_2_47.mp3

# Check duration in DB
docker compose exec db psql -U postgres -c \
  "SELECT canonical_id, audio_duration_ms FROM verse_audio_metadata WHERE canonical_id='BG_2_47'"