Data
Bhagavad Geeta content sources, licensing, and ingestion.
Sources
Primary: gita/gita Repository
- URL: https://github.com/gita/gita
- License: The Unlicense (Public Domain)
- Content: 701 verses across 18 chapters
- Format: JSON with Sanskrit, transliteration, translations
Secondary: VedicScriptures API
- URL: https://github.com/vedicscriptures/bhagavad-gita-api
- License: MIT
- Content: Additional translations and commentaries
- Used for: Enriching verse data with multiple perspectives
Sanskrit Text
The original Bhagavad Geeta verses are ancient texts (~5th century BCE to 2nd century CE), freely usable worldwide without copyright restrictions.
Data Structure
Each verse contains:
{
"canonical_id": "BG_2_47",
"chapter": 2,
"verse": 47,
"sanskrit_devanagari": "कर्मण्येवाधिकारस्ते...",
"sanskrit_iast": "karmaṇy-evādhikāras te...",
"translations": [
{
"author": "Swami Sivananda",
"text": "Your right is to work only..."
}
],
"commentaries": [
{
"author": "Adi Shankaracharya",
"text": "..."
}
]
}
Ingestion Pipeline
On first run, the backend:
- Reads verse JSON from
data/directory - Validates structure and required fields
- Inserts into PostgreSQL (verses, translations, commentaries tables)
- Generates embeddings using sentence-transformers
- Stores vectors in ChromaDB for semantic search
Manual re-ingestion:
docker compose exec backend python scripts/ingest_data.py --all
Embeddings
- Model:
sentence-transformers/all-MiniLM-L6-v2 - Dimensions: 384
- Indexed fields: Sanskrit IAST + English translation
- Storage: ChromaDB with cosine similarity
Attribution
While the Unlicense doesn’t require attribution, we acknowledge:
- gita/gita repository maintainers
- VedicScriptures API contributors
- Traditional commentators (Shankaracharya, Ramanuja, etc.)
- Translators whose work enables access to this wisdom