Architecture

System design and technical decisions for Geetanjali.

Overview

Geetanjali uses retrieval-augmented generation (RAG) to ground ethical guidance in Bhagavad Geeta scripture. Users submit ethical dilemmas, the system retrieves relevant verses, and an LLM generates structured recommendations with citations.

User Query → Embedding → Vector Search → LLM Generation → Structured Output
                              ↓
                        Geeta Verses
                        (701 verses)

Components

┌──────────────────────────────────────────────────────────────┐
│                                                              │
│                       Docker Network                         │
│                                                              │
│   ┌────────────┐    ┌────────────┐    ┌────────────┐         │
│   │  Frontend  │───▶│  Backend   │───▶│ PostgreSQL │         │
│   │    :80     │    │   :8000    │    │   :5432    │         │
│   └────────────┘    └─────┬──────┘    └────────────┘         │
│                           │                                  │
│            ┌──────────────┼──────────────┐                   │
│            │              │              │                   │
│            ▼              ▼              ▼                   │
│       ┌─────────┐   ┌─────────┐   ┌─────────┐                │
│       │ChromaDB │   │  Redis  │   │ Ollama  │                │
│       │  :8000  │   │  :6379  │   │ :11434  │                │
│       └─────────┘   └─────────┘   └─────────┘                │
│                                                              │
│   ┌──────────────────────────────────────────────────────┐   │
│   │                                                      │   │
│   │  Observability (Optional)                            │   │
│   │                                                      │   │
│   │   ┌────────────┐        ┌────────────┐               │   │
│   │   │ Prometheus │───────▶│  Grafana   │               │   │
│   │   │   :9090    │        │   :3000    │               │   │
│   │   └─────┬──────┘        └────────────┘               │   │
│   │         │                                            │   │
│   │         └── Scrapes /metrics from Backend + Worker   │   │
│   │                                                      │   │
│   └──────────────────────────────────────────────────────┘   │
│                                                              │
└──────────────────────────────────────────────────────────────┘
Component Purpose
Frontend React SPA for users; static HTML pages served to search engine bots
Backend FastAPI handling auth, cases, RAG pipeline, verse management
Worker RQ background processor for async analysis jobs
PostgreSQL Cases, users, outputs, verses, feedback
ChromaDB Vector embeddings for semantic verse search
Redis Caching, session storage, task queues, rate limiting
Ollama Local LLM inference (primary—self-hosted, no API costs)
Cloud LLMs Gemini/Anthropic APIs (fallback when hardware is limited)
Prometheus Metrics collection and time-series storage (optional)
Grafana Dashboards, alerting, visualization (optional)

RAG Pipeline

1. Embedding

User query and all verses are embedded using sentence-transformers/all-MiniLM-L6-v2 (384 dimensions). Embeddings are computed client-side in the backend container—ChromaDB stores vectors but does not compute them.

Note: ChromaDB collection metadata binds embedding function configuration. Queries must use the same embedding function as collection creation. This requires sentence-transformers installed in backend.

Internal API (v1.37.0)

For memory efficiency in budget deployments, the worker delegates vector search to the backend via an internal API:

Worker                              Backend
  │                                    │
  │  POST /internal/search             │
  │  {"query": "...", "top_k": 5}      │
  │ ──────────────────────────────────>│
  │                                    │ VectorStore.search()
  │                                    │      │
  │                                    │      ▼
  │                                    │   ChromaDB
  │                                    │
  │  {"ids": [...], "distances": [...]}│
  │ <──────────────────────────────────│
  │                                    │

This allows the backend to own the embedding model (~400MB), while the worker remains lightweight for job processing. Worker memory reduced from 384MB to 128MB.

Configuration:

2. Retrieval

ChromaDB performs cosine similarity search, returning top-k relevant verses with scores.

3. Multi-Pass Generation

The system uses a 5-pass refinement workflow to ensure thoughtful, well-grounded guidance. This iterative approach compensates for smaller local models—rather than relying on expensive cloud APIs, we invest computation time in refinement to achieve quality output from self-hosted inference.

  1. Acceptance — Validates the query is a genuine ethical dilemma (not factual questions or harmful requests)
  2. Draft — Generates initial reasoning without format constraints
  3. Critique — Reviews the draft for depth, gaps, and verse alignment
  4. Refine — Rewrites addressing critique, improving clarity and specificity
  5. Structure — Converts refined prose into structured JSON output

Each pass is audited for quality analysis. If structuring fails, the system reconstructs output from earlier passes with appropriate confidence flagging.

Output includes:

Resilience Patterns

Circuit Breakers

External services are protected by circuit breakers that prevent cascading failures:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   CLOSED    │────▶│  HALF_OPEN  │────▶│    OPEN     │
│ (normal)    │     │   (probe)   │     │  (reject)   │
└─────────────┘     └─────────────┘     └─────────────┘
       ▲                  │                   │
       └──────────────────┴───────────────────┘
              success           timeout
Service Failure Threshold Recovery Timeout Fallback
Ollama LLM 3 consecutive 60s Cloud fallback (if configured)
Cloud LLMs 3 consecutive 60s Next provider or error
ChromaDB 3 consecutive 60s SQL keyword search
Email (Resend) 5 consecutive 60s Queue for retry

Provider Configuration

Each LLM provider has auto-tuned defaults without requiring feature flags:

Provider Temperature Timeout Prompt Optimization Structured Output
Ollama 0.3 (deterministic) 30s Simplified prompts Multipass refinement
Gemini 0.3 (deterministic) 30s (milliseconds) Standard prompts Explicit JSON schema
Anthropic 0.7 (balanced) 60s Standard prompts No schema (reliable native)
Mock 0.0 Instant N/A Hardcoded JSON

Auto-tuning means the system automatically applies provider-specific optimizations without configuration. Gemini uses explicit JSON schema for structured output, Ollama uses simplified prompts for efficiency, and temperature is lowered for deterministic responses.

Fallback Chains

LLM Inference (configurable via LLM_PROVIDER and LLM_FALLBACK_PROVIDER):

Primary Provider ──[CB open]──▶ Fallback Provider ──[CB open]──▶ Error
       │                               │
       └── ollama (recommended)        └── gemini | anthropic | mock

The ideal setup uses Ollama locally with multi-pass refinement for quality. Cloud providers (Gemini, Anthropic) are available as fallbacks when hardware resources are limited.

Vector Search:

ChromaDB semantic search ──[CB open]──▶ PostgreSQL keyword search
         │                                      │
         └── cosine similarity                  └── ILIKE + ts_vector

Retry Logic

Operations use exponential backoff with jitter:

Cache Stampede Protection

TTL values include ±10% jitter to prevent thundering herd on expiry. Daily verse cache expires around midnight UTC with randomized offset.

Data Model

users ──────┬──── cases ──────── outputs
            │        │              │
            │        └── messages   └── feedback
            │
verses ─────┴──── translations
   │
   └──── commentaries

Key entities:

Authentication

Anonymous users can create and view cases. Authenticated users get persistent history.

API Design

RESTful API at /api/v1/:

/auth/*          - Login, signup, refresh, logout
/cases/*         - CRUD + analyze + follow-up conversations
/verses/*        - Browse, search, daily verse
/outputs/*       - View analysis, submit feedback, export
/messages/*      - Conversation history for cases
/contact         - Contact form submission

Follow-up Conversations

After initial analysis, users can ask follow-up questions via POST /cases/{id}/follow-up. This async endpoint:

Full OpenAPI docs at /docs when running.

Audio Subsystem

Geetanjali includes AI-generated Sanskrit recitations for all 701 verses plus Geeta Dhyanam invocations.

Audio Delivery

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Frontend      │────▶│     Nginx       │────▶│  /audio/mp3/    │
│   <audio>       │     │   (static)      │     │   (Git LFS)     │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                                │
                                └── Cache-Control: 1 year, immutable

Text-to-Speech (TTS) API

Real-time TTS for user-selected text using Edge TTS:

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Frontend      │────▶│   /api/v1/tts   │────▶│    Edge TTS     │
│   POST text     │     │   (backend)     │     │   (Microsoft)   │
└─────────────────┘     └────────┬────────┘     └─────────────────┘
                                 │
                                 ▼
                        ┌─────────────────┐
                        │   Redis Cache   │
                        │   (24h TTL)     │
                        └─────────────────┘

Deployment

Docker Compose orchestrates core services (7) plus optional observability (2):

# Core services (docker-compose.yml)
services:
  ollama      # LLM inference (pre-built image, models in volume)
  postgres    # Primary database
  redis       # Cache, queues, rate limiting
  chromadb    # Vector database
  backend     # FastAPI with Uvicorn
  worker      # RQ background task processor
  frontend    # Nginx serving React build

# Observability (docker-compose.observability.yml)
services:
  prometheus  # Metrics collection
  grafana     # Dashboards and alerting

Production considerations:

See Also