Data

Bhagavad Geeta content sources, licensing, and ingestion.

Sources

Primary: gita/gita Repository

Secondary: VedicScriptures API

Sanskrit Text

The original Bhagavad Geeta verses are ancient texts (~5th century BCE to 2nd century CE), freely usable worldwide without copyright restrictions.

Data Structure

Each verse contains:

{
  "canonical_id": "BG_2_47",
  "chapter": 2,
  "verse": 47,
  "sanskrit_devanagari": "कर्मण्येवाधिकारस्ते...",
  "sanskrit_iast": "karmaṇy-evādhikāras te...",
  "translations": [
    {
      "author": "Swami Sivananda",
      "text": "Your right is to work only..."
    }
  ],
  "commentaries": [
    {
      "author": "Adi Shankaracharya",
      "text": "..."
    }
  ]
}

Ingestion Pipeline

On first run, the backend:

  1. Reads verse JSON from data/ directory
  2. Validates structure and required fields
  3. Inserts into PostgreSQL (verses, translations, commentaries tables)
  4. Generates embeddings using sentence-transformers
  5. Stores vectors in ChromaDB for semantic search

Manual re-ingestion:

docker compose exec backend python scripts/ingest_data.py --all

Embeddings

Attribution

While the Unlicense doesn’t require attribution, we acknowledge: