Audio Recitations

Geetanjali includes AI-generated Sanskrit recitations of the Bhagavad Gita verses. Hear proper pronunciation while you read.

Overview

Each verse can be played aloud with a natural Sanskrit recitation. The voice is Aryan—a male voice trained specifically for Indian languages with clear enunciation and a measured pace suitable for contemplation.

Technology: Indic Parler-TTS by AI4Bharat

Current coverage: All 700 verses + 9 Gita Dhyanam invocations

How It Works

Sanskrit Text → AI Voice Synthesis → Audio File
      ↓                ↓                  ↓
  Devanagari      Aryan voice       MP3 (128kbps)
  from source     with curated      ready for
  scripture       tempo/emotion     playback

Each verse is individually processed with metadata that controls:

Aspect	Effect
Tempo	Measured pace for most verses; slower for key teachings
Emotion	Serene for wisdom, firm for commands, compassionate for consolation
Emphasis	Special treatment for maha vakyas (great sayings)

Maha Vakyas

Foundational verses like BG 2.47 (karmaṇy evādhikāras te) receive enhanced treatment:

Slower, more deliberate delivery
Greater gravitas in tone
Pauses for absorption

Audio Quality

Format: MP3, 128kbps stereo
Duration: 7–15 seconds per verse (varies with length)
Voice: Aryan (male, Sanskrit-optimized)
Clarity: Optimized for both speakers and headphones

Playback Features

Single Verse

Click the play button on any verse card or detail page. Controls include:

Play/pause toggle
Progress bar with seek
Playback speed (0.75×, 1×, 1.25×)
Loop mode for memorization

Reading Mode

Sequential playback with two auto-advance modes:

Mode	Behavior
Listen	Plays audio, advances after completion + 800ms pause
Read	Timer-based at 80% of audio duration (silent reading pace)

The next verse audio preloads at 80% progress to eliminate gaps.

Study Mode

Guided narration through verses with auto-advance:

Chapter Intro → Verse 1 → Verse 2 → ... → Chapter Complete
                   ↓
      Sanskrit → English → Hindi → Insight → Next

Flow:

Chapter intro (summary narration via TTS)
Verse announcement (“Verse 1 of 72”)
Sanskrit audio recitation
English translation TTS
Hindi translation TTS (if enabled)
Insight/commentary TTS
Auto-advance to next verse (2s pause)
Chapter completion prompt at end

Trigger: Study button in MiniPlayer

Controls:

Action	Gesture	Keyboard
Pause/Resume	Tap screen	Space
Skip section	—	→
Skip verse	—	↓
Stop	Tap Stop	Escape

Settings (in Settings page):

Include Hindi translations
Include commentary insights
Play chapter introduction

UI: Progress bar, section dots, verse position counter, status text.

Accessibility: aria-live announcements, 44px touch targets, keyboard shortcuts.

Media Controls

Lock screen and notification controls work via the Media Session API. Pause, resume, and see verse info without unlocking your device.

Offline Audio

Audio files cache automatically for offline playback.

How it works:

Audio plays immediately via nginx (no blocking)
Full file cached in background for subsequent plays
Cloud icon in MiniPlayer indicates cached status

Cache limits:

100MB quota with LRU eviction
Manage in Settings → Audio Cache (see file count, clear cache)

Reliability features:

Cache-first with no validation overhead for fast playback
20-second load timeout with clear error message
Download validation before caching (size, completeness)
Background caching: audio plays immediately, cached asynchronously

Technical:

Service Worker intercepts /audio/ requests
Cache hit: serves immediately without validation
Cache miss: fetches from network, caches in background
Supports Range requests for seeking within cached files

For Contributors

Generation Workflow

Audio generation uses a three-step pipeline:

Export metadata from the database (runs locally via Docker)
Generate audio using Indic Parler-TTS in Google Colab (GPU required)
Post-process WAV to MP3 and organize files

Scripts are in backend/scripts/:

Script	Purpose
`export_tts_metadata.py`	Export verse text + voice parameters
`indic_parler_tts.ipynb`	Colab notebook for TTS generation
`process_tts_audio.py`	Convert WAV → MP3, organize files

Adding a New Chapter

Curate voice metadata in backend/data/verse_audio_metadata/chapter_XX.py
Export: docker compose exec backend python /app/scripts/export_tts_metadata.py --chapter XX
Generate in Colab (upload JSON, run cells, download ZIP)
Process: python3 backend/scripts/process_tts_audio.py ~/Downloads/chapter_XX_wav.zip
Commit MP3 files to public/audio/mp3/XX/

Resuming Interrupted Generation

The Colab notebook checkpoints progress. If your session disconnects:

Reconnect and re-run setup cells
Re-upload the same metadata JSON
Run the generation cell—it resumes from the checkpoint

File Structure

public/audio/mp3/
├── 01/           # Chapter 1 (47 files)
├── 02/           # Chapter 2 (72 files)
├── ...
├── 18/           # Chapter 18 (78 files)
└── dhyanam/      # Gita Dhyanam (9 files)

Files follow the canonical ID pattern: BG_{chapter}_{verse}.mp3

Total: 710 MP3 files (701 verses + 9 dhyanam)

Architecture

frontend/src/
├── components/audio/
│   ├── AudioPlayerContext.tsx   # Global audio state
│   ├── MiniPlayer/              # Reading mode player
│   │   ├── MiniPlayerActive.tsx # Unified active player (all modes)
│   │   └── MiniPlayerModeSelector.tsx # Listen/Read/Study mode picker
│   └── FloatingAudioBar.tsx     # Fixed bottom bar during scroll
├── hooks/
│   ├── useAutoAdvance.ts        # Listen/Read mode logic
│   ├── useStudyMode.ts          # Section sequencing (single verse)
│   ├── useStudyAutoMode.ts      # Study Auto Mode orchestration
│   └── useAudioCache.ts         # Cache status and management
└── lib/
    └── audioPreload.ts          # Background preloading

Study Auto Mode Hook Composition:

useStudyAutoMode
├── useStudyMode (section sequencing: sanskrit → english → hindi → insight)
├── AudioPlayerContext (Sanskrit audio files)
└── TTSContext (Edge TTS for translations/commentary)

The audio context ensures only one audio plays at a time. When a new verse starts, any playing audio stops automatically.

Text-to-Speech (TTS) API

For user-selected text (translations, commentaries), Geetanjali provides real-time TTS via the /api/v1/tts endpoint.

Voices

Language	Voice	Description
Hindi (`hi`)	`hi-IN-MadhurNeural`	Male, clear Sanskrit pronunciation
English (`en`)	`en-US-AriaNeural`	Female, natural reading voice

Caching

TTS responses are cached in Redis to reduce latency and API costs:

TTL: 24 hours
Key format: tts:{lang}:{rate}:{pitch}:{text_hash}
Cache hit: Served immediately with X-Cache: HIT header
Cache miss: Generated via Edge TTS, then cached

Metrics

Prometheus metrics for monitoring TTS health:

Metric	Type	Labels	Description
`geetanjali_tts_requests_total`	Counter	`lang`, `result`	Total TTS requests
`geetanjali_tts_cache_hits_total`	Counter	—	Cache hit count
`geetanjali_tts_cache_misses_total`	Counter	—	Cache miss count

Grafana dashboard: The main Geetanjali dashboard includes a TTS panel showing request volume and cache hit rate.

Rate Limiting

Limit: 30 requests/minute per user/IP
Scope: Applied at the API gateway level

Acknowledgments

AI4Bharat for the Indic Parler-TTS model
Sanskrit text from traditional sources