Audio Recitations
Geetanjali includes AI-generated Sanskrit recitations of the Bhagavad Gita verses. Hear proper pronunciation while you read.
Overview
Each verse can be played aloud with a natural Sanskrit recitation. The voice is Aryan—a male voice trained specifically for Indian languages with clear enunciation and a measured pace suitable for contemplation.
Technology: Indic Parler-TTS by AI4Bharat
Current coverage: All 700 verses + 9 Geeta Dhyanam invocations
How It Works
Sanskrit Text → AI Voice Synthesis → Audio File
↓ ↓ ↓
Devanagari Aryan voice MP3 (128kbps)
from source with curated ready for
scripture tempo/emotion playback
Each verse is individually processed with metadata that controls:
| Aspect | Effect |
|---|---|
| Tempo | Measured pace for most verses; slower for key teachings |
| Emotion | Serene for wisdom, firm for commands, compassionate for consolation |
| Emphasis | Special treatment for maha vakyas (great sayings) |
Maha Vakyas
Foundational verses like BG 2.47 (karmaṇy evādhikāras te) receive enhanced treatment:
- Slower, more deliberate delivery
- Greater gravitas in tone
- Pauses for absorption
Audio Quality
- Format: MP3, 128kbps stereo
- Duration: 7–15 seconds per verse (varies with length)
- Voice: Aryan (male, Sanskrit-optimized)
- Clarity: Optimized for both speakers and headphones
Playback Features
Single Verse
Click the play button on any verse card or detail page. Controls include:
- Play/pause toggle
- Progress bar with seek
- Playback speed (0.75×, 1×, 1.25×)
- Loop mode for memorization
Reading Mode
Sequential playback with two auto-advance modes:
| Mode | Behavior |
|---|---|
| Listen | Plays audio, advances after completion + 800ms pause |
| Read | Timer-based at 80% of audio duration (silent reading pace) |
The next verse audio preloads at 80% progress to eliminate gaps.
Study Mode
Guided narration through verses with auto-advance:
Chapter Intro → Verse 1 → Verse 2 → ... → Chapter Complete
↓
Sanskrit → English → Hindi → Insight → Next
Flow:
- Chapter intro (summary narration via TTS)
- Verse announcement (“Verse 1 of 72”)
- Sanskrit audio recitation
- English translation TTS
- Hindi translation TTS (if enabled)
- Insight/commentary TTS
- Auto-advance to next verse (2s pause)
- Chapter completion prompt at end
Trigger: Study button in MiniPlayer
Controls:
| Action | Gesture | Keyboard |
|---|---|---|
| Pause/Resume | Tap screen | Space |
| Skip section | — | → |
| Skip verse | — | ↓ |
| Stop | Tap Stop | Escape |
Settings (in Settings page):
- Include Hindi translations
- Include commentary insights
- Play chapter introduction
UI: Progress bar, section dots, verse position counter, status text.
Accessibility: aria-live announcements, 44px touch targets, keyboard shortcuts.
Media Controls
Lock screen and notification controls work via the Media Session API. Pause, resume, and see verse info without unlocking your device.
Offline Audio
Audio files cache automatically for offline playback.
How it works:
- Audio plays immediately via nginx (no blocking)
- Full file cached in background for subsequent plays
- Cloud icon in MiniPlayer indicates cached status
Cache limits:
- 100MB quota with LRU eviction
- Manage in Settings → Audio Cache (see file count, clear cache)
Reliability features:
- Cache-first with no validation overhead for fast playback
- 20-second load timeout with clear error message
- Download validation before caching (size, completeness)
- Background caching: audio plays immediately, cached asynchronously
Technical:
- Service Worker intercepts
/audio/requests - Cache hit: serves immediately without validation
- Cache miss: fetches from network, caches in background
- Supports Range requests for seeking within cached files
For Contributors
Generation Workflow
Audio generation uses a three-step pipeline:
- Export metadata from the database (runs locally via Docker)
- Generate audio using Indic Parler-TTS in Google Colab (GPU required)
- Post-process WAV to MP3 and organize files
Scripts are in backend/scripts/:
| Script | Purpose |
|---|---|
export_tts_metadata.py |
Export verse text + voice parameters |
indic_parler_tts.ipynb |
Colab notebook for TTS generation |
process_tts_audio.py |
Convert WAV → MP3, organize files |
Adding a New Chapter
- Curate voice metadata in
backend/data/verse_audio_metadata/chapter_XX.py - Export:
docker compose exec backend python /app/scripts/export_tts_metadata.py --chapter XX - Generate in Colab (upload JSON, run cells, download ZIP)
- Process:
python3 backend/scripts/process_tts_audio.py ~/Downloads/chapter_XX_wav.zip - Commit MP3 files to
public/audio/mp3/XX/
Resuming Interrupted Generation
The Colab notebook checkpoints progress. If your session disconnects:
- Reconnect and re-run setup cells
- Re-upload the same metadata JSON
- Run the generation cell—it resumes from the checkpoint
File Structure
public/audio/mp3/
├── 01/ # Chapter 1 (47 files)
├── 02/ # Chapter 2 (72 files)
├── ...
├── 18/ # Chapter 18 (78 files)
└── dhyanam/ # Geeta Dhyanam (9 files)
Files follow the canonical ID pattern: BG_{chapter}_{verse}.mp3
Total: 710 MP3 files (701 verses + 9 dhyanam)
Architecture
frontend/src/
├── components/audio/
│ ├── AudioPlayerContext.tsx # Global audio state
│ ├── MiniPlayer/ # Reading mode player
│ │ ├── MiniPlayerActive.tsx # Unified active player (all modes)
│ │ └── MiniPlayerModeSelector.tsx # Listen/Read/Study mode picker
│ └── FloatingAudioBar.tsx # Fixed bottom bar during scroll
├── hooks/
│ ├── useAutoAdvance.ts # Listen/Read mode logic
│ ├── useStudyMode.ts # Section sequencing (single verse)
│ ├── useStudyAutoMode.ts # Study Auto Mode orchestration
│ └── useAudioCache.ts # Cache status and management
└── lib/
└── audioPreload.ts # Background preloading
Study Auto Mode Hook Composition:
useStudyAutoMode
├── useStudyMode (section sequencing: sanskrit → english → hindi → insight)
├── AudioPlayerContext (Sanskrit audio files)
└── TTSContext (Edge TTS for translations/commentary)
The audio context ensures only one audio plays at a time. When a new verse starts, any playing audio stops automatically.
Text-to-Speech (TTS) API
For user-selected text (translations, commentaries), Geetanjali provides real-time TTS via the /api/v1/tts endpoint.
Voices
| Language | Voice | Description |
|---|---|---|
Hindi (hi) |
hi-IN-MadhurNeural |
Male, clear Sanskrit pronunciation |
English (en) |
en-US-AriaNeural |
Female, natural reading voice |
Caching
TTS responses are cached in Redis to reduce latency and API costs:
- TTL: 24 hours
- Key format:
tts:{lang}:{rate}:{pitch}:{text_hash} - Cache hit: Served immediately with
X-Cache: HITheader - Cache miss: Generated via Edge TTS, then cached
Metrics
Prometheus metrics for monitoring TTS health:
| Metric | Type | Labels | Description |
|---|---|---|---|
geetanjali_tts_requests_total |
Counter | lang, result |
Total TTS requests |
geetanjali_tts_cache_hits_total |
Counter | — | Cache hit count |
geetanjali_tts_cache_misses_total |
Counter | — | Cache miss count |
Grafana dashboard: The main Geetanjali dashboard includes a TTS panel showing request volume and cache hit rate.
Rate Limiting
- Limit: 30 requests/minute per user/IP
- Scope: Applied at the API gateway level
Acknowledgments
- AI4Bharat for the Indic Parler-TTS model
- Sanskrit text from traditional sources