Sovereign, Open Source Retrieval-Augmented Generation
OpenRAG is a modular framework for exploring Retrieval-Augmented Generation (RAG) techniques. Built for transparency and rapid experimentation, it empowers teams to develop state-of-the-art document-grounded AI systems—fully ready for production-scale deployment.
Linagora also researched the best embedder-reranker pair. Given the data it intended to feed its prototype, it calculated metrics from the SciFact dataset. It determined that the most appropriate approach was to pair KaLM-mini-instruct with GTE... or Jina v2, which offers the best performance/latency compromise.”
Silicon.fr, July 2025

Sovereign, Open Source Retrieval-Augmented Generation
OpenRAG is a modular framework for exploring Retrieval-Augmented Generation (RAG) techniques. Built for transparency and rapid experimentation, it empowers teams to develop state-of-the-art document-grounded AI systems—fully ready for production-scale deployment.
Linagora also researched the best embedder-reranker pair. Given the data it intended to feed its prototype, it calculated metrics from the SciFact dataset. It determined that the most appropriate approach was to pair KaLM-mini-instruct with GTE... or Jina v2, which offers the best performance/latency compromise.”
Silicon.fr, July 2025
Understand RAG in 5 Minutes
Watch this concise video overview to understand what RAG is and how OpenRAG lets you deploy your own AI assistant on your private documents, images, audio, and more.
Key Features
  • Open Source & Sovereign
    AGPL-licensed, auditable, and community-driven.
  • LLM-Agnostic
    Use your own model or connect to a hosted provider like Mistral, Claude, or GPT — fully flexible and plug-and-play.
  • Vector Search
    With Milvus, segment your knowledge base per user or team.
  • Multimodal Parsing
    Supports audio transcription, email parsing, image captioning, and layout-aware PDF processing.
  • Scalable with Ray
    Process, embed, and rerank at cluster scale using distributed tasks.
  • Modern UIs
    Web-based indexer, FastAPI, Chainlit chat, and OpenAI-compatible API.
How It Works
A RAG (Retrieval-Augmented Generation) system works by first retrieving relevant documents from a knowledge base based on the user's query, then using a language model to generate a precise, context-aware response grounded in that retrieved information.
Built on Robust Open Source Foundations
  • Parallelized Processing with Ray
    OpenRAG uses Ray to parallelize chunking, embedding, and ingestion across CPUs and GPUs — enabling fast, scalable processing of large document sets, including audio files, PDFs, scanned documents, and images.
    It can be deployed seamlessly on Kubernetes for distributed, production-grade workloads.
  • Smart Chunking and Layout-Aware Parsing
    OpenRAG uses advanced loaders like Docling and Marker to parse complex layouts—including OCR-enhanced PDFs—and applies format-aware chunking enriched with metadata and context summaries. This chunk contextualization, inspired by Anthropic’s approach, significantly boosts retrieval relevance.
  • Hybrid and Contextual Retrieval
    OpenRAG blends semantic search and BM25 with optional query reformulation to improve results from vague or underspecified queries. Retrieval can also be enhanced with HyDE, which generates a hypothetical answer and uses it to guide document selection.
  • Multilingual Reranking with Infinity
    OpenRAG can optionally rerank with GTE or Jina v2 using Infinity Inference Server to filter top candidates by semantic relevance across languages and formats.
  • LLM Integration and API Compatibility
    LLM Integration and API Compatibility
    OpenRAG is LLM-agnostic, supporting Mistral, GPT-4, Claude, and more for chat-based interactions. Its fully OpenAI-compatible chat API enables seamless integration with existing infrastructure and tools like LangChain, OpenWebUI, or N8N — no changes required.
  • Automated Evaluation Pipelines
    Built-in clustering (via UMAP + HDBScan) lets OpenRAG auto-generate synthetic QA datasets from your indexed documents. A local LLM scores each query-chunk pair to help you tune the retrieval strategy for precision, recall, and coverage — before deploying to production.
Building Trustworthy Enterprise RAG with Open Source Power
In this in-depth webinar, our team walks you through the process of building and deploying Retrieval-Augmented Generation (RAG) applications using OpenRAG.
Whether you're transitioning from prototypes to real-world production environments, concerned about hallucinations, or facing scalability challenges, this session is designed to help you navigate these complexities.