Audio Recitations

Geetanjali includes AI-generated Sanskrit recitations of the Bhagavad Gita verses. Hear proper pronunciation while you read.

Overview

Each verse can be played aloud with a natural Sanskrit recitation. The voice is Aryan—a male voice trained specifically for Indian languages with clear enunciation and a measured pace suitable for contemplation.

Technology: Indic Parler-TTS by AI4Bharat

Current coverage: All 700 verses + 9 Geeta Dhyanam invocations


How It Works

Sanskrit Text → AI Voice Synthesis → Audio File
      ↓                ↓                  ↓
  Devanagari      Aryan voice       MP3 (128kbps)
  from source     with curated      ready for
  scripture       tempo/emotion     playback

Each verse is individually processed with metadata that controls:

Aspect Effect
Tempo Measured pace for most verses; slower for key teachings
Emotion Serene for wisdom, firm for commands, compassionate for consolation
Emphasis Special treatment for maha vakyas (great sayings)

Maha Vakyas

Foundational verses like BG 2.47 (karmaṇy evādhikāras te) receive enhanced treatment:


Audio Quality


Playback Features

Single Verse

Click the play button on any verse card or detail page. Controls include:

Reading Mode

Sequential playback with two auto-advance modes:

Mode Behavior
Listen Plays audio, advances after completion + 800ms pause
Read Timer-based at 80% of audio duration (silent reading pace)

The next verse audio preloads at 80% progress to eliminate gaps.

Study Mode

Guided narration through verses with auto-advance:

Chapter Intro → Verse 1 → Verse 2 → ... → Chapter Complete
                   ↓
      Sanskrit → English → Hindi → Insight → Next

Flow:

  1. Chapter intro (summary narration via TTS)
  2. Verse announcement (“Verse 1 of 72”)
  3. Sanskrit audio recitation
  4. English translation TTS
  5. Hindi translation TTS (if enabled)
  6. Insight/commentary TTS
  7. Auto-advance to next verse (2s pause)
  8. Chapter completion prompt at end

Trigger: Study button in MiniPlayer

Controls:

Action Gesture Keyboard
Pause/Resume Tap screen Space
Skip section
Skip verse
Stop Tap Stop Escape

Settings (in Settings page):

UI: Progress bar, section dots, verse position counter, status text.

Accessibility: aria-live announcements, 44px touch targets, keyboard shortcuts.

Media Controls

Lock screen and notification controls work via the Media Session API. Pause, resume, and see verse info without unlocking your device.


Offline Audio

Audio files cache automatically for offline playback.

How it works:

Cache limits:

Reliability features:

Technical:


For Contributors

Generation Workflow

Audio generation uses a three-step pipeline:

  1. Export metadata from the database (runs locally via Docker)
  2. Generate audio using Indic Parler-TTS in Google Colab (GPU required)
  3. Post-process WAV to MP3 and organize files

Scripts are in backend/scripts/:

Script Purpose
export_tts_metadata.py Export verse text + voice parameters
indic_parler_tts.ipynb Colab notebook for TTS generation
process_tts_audio.py Convert WAV → MP3, organize files

Adding a New Chapter

  1. Curate voice metadata in backend/data/verse_audio_metadata/chapter_XX.py
  2. Export: docker compose exec backend python /app/scripts/export_tts_metadata.py --chapter XX
  3. Generate in Colab (upload JSON, run cells, download ZIP)
  4. Process: python3 backend/scripts/process_tts_audio.py ~/Downloads/chapter_XX_wav.zip
  5. Commit MP3 files to public/audio/mp3/XX/

Resuming Interrupted Generation

The Colab notebook checkpoints progress. If your session disconnects:

  1. Reconnect and re-run setup cells
  2. Re-upload the same metadata JSON
  3. Run the generation cell—it resumes from the checkpoint

File Structure

public/audio/mp3/
├── 01/           # Chapter 1 (47 files)
├── 02/           # Chapter 2 (72 files)
├── ...
├── 18/           # Chapter 18 (78 files)
└── dhyanam/      # Geeta Dhyanam (9 files)

Files follow the canonical ID pattern: BG_{chapter}_{verse}.mp3

Total: 710 MP3 files (701 verses + 9 dhyanam)


Architecture

frontend/src/
├── components/audio/
│   ├── AudioPlayerContext.tsx   # Global audio state
│   ├── MiniPlayer/              # Reading mode player
│   │   ├── MiniPlayerActive.tsx # Unified active player (all modes)
│   │   └── MiniPlayerModeSelector.tsx # Listen/Read/Study mode picker
│   └── FloatingAudioBar.tsx     # Fixed bottom bar during scroll
├── hooks/
│   ├── useAutoAdvance.ts        # Listen/Read mode logic
│   ├── useStudyMode.ts          # Section sequencing (single verse)
│   ├── useStudyAutoMode.ts      # Study Auto Mode orchestration
│   └── useAudioCache.ts         # Cache status and management
└── lib/
    └── audioPreload.ts          # Background preloading

Study Auto Mode Hook Composition:

useStudyAutoMode
├── useStudyMode (section sequencing: sanskrit → english → hindi → insight)
├── AudioPlayerContext (Sanskrit audio files)
└── TTSContext (Edge TTS for translations/commentary)

The audio context ensures only one audio plays at a time. When a new verse starts, any playing audio stops automatically.


Text-to-Speech (TTS) API

For user-selected text (translations, commentaries), Geetanjali provides real-time TTS via the /api/v1/tts endpoint.

Voices

Language Voice Description
Hindi (hi) hi-IN-MadhurNeural Male, clear Sanskrit pronunciation
English (en) en-US-AriaNeural Female, natural reading voice

Caching

TTS responses are cached in Redis to reduce latency and API costs:

Metrics

Prometheus metrics for monitoring TTS health:

Metric Type Labels Description
geetanjali_tts_requests_total Counter lang, result Total TTS requests
geetanjali_tts_cache_hits_total Counter Cache hit count
geetanjali_tts_cache_misses_total Counter Cache miss count

Grafana dashboard: The main Geetanjali dashboard includes a TTS panel showing request volume and cache hit rate.

Rate Limiting


Acknowledgments