Sovereign, Open Source Retrieval-Augmented Generation

OpenRAG is a modular framework for exploring Retrieval-Augmented Generation (RAG) techniques. Built for transparency and rapid experimentation, it empowers teams to develop state-of-the-art document-grounded AI systems—fully ready for production-scale deployment.

Get on GitHub Contact Us

“

“Linagora also researched the best embedder-reranker pair. Given the data it intended to feed its prototype, it calculated metrics from the SciFact dataset. It determined that the most appropriate approach was to pair KaLM-mini-instruct with GTE... or Jina v2, which offers the best performance/latency compromise.”

Silicon.fr, July 2025

Sovereign, Open Source Retrieval-Augmented Generation

OpenRAG is a modular framework for exploring Retrieval-Augmented Generation (RAG) techniques. Built for transparency and rapid experimentation, it empowers teams to develop state-of-the-art document-grounded AI systems—fully ready for production-scale deployment.

Get on GitHub Contact Us

“

“Linagora also researched the best embedder-reranker pair. Given the data it intended to feed its prototype, it calculated metrics from the SciFact dataset. It determined that the most appropriate approach was to pair KaLM-mini-instruct with GTE... or Jina v2, which offers the best performance/latency compromise.”

Silicon.fr, July 2025

Understand RAG in 5 Minutes

Watch this concise video overview to understand what RAG is and how OpenRAG lets you deploy your own AI assistant on your private documents, images, audio, and more.

Key Features

Open Source & Sovereign

AGPL-licensed, auditable, and community-driven.
LLM-Agnostic

Use your own model or connect to a hosted provider like Mistral, Claude, or GPT — fully flexible and plug-and-play.
Vector Search

With Milvus, segment your knowledge base per user or team.
Multimodal Parsing

Supports audio transcription, email parsing, image captioning, and layout-aware PDF processing.
Scalable with Ray

Process, embed, and rerank at cluster scale using distributed tasks.
Modern UIs

Web-based indexer, FastAPI, Chainlit chat, and OpenAI-compatible API.

How It Works

A RAG (Retrieval-Augmented Generation) system works by first retrieving relevant documents from a knowledge base based on the user's query, then using a language model to generate a precise, context-aware response grounded in that retrieved information.

Built on Robust Open Source Foundations

Parallelized Processing with Ray

OpenRAG uses Ray to parallelize chunking, embedding, and ingestion across CPUs and GPUs — enabling fast, scalable processing of large document sets, including audio files, PDFs, scanned documents, and images.
It can be deployed seamlessly on Kubernetes for distributed, production-grade workloads.
Smart Chunking and Layout-Aware Parsing

OpenRAG uses advanced loaders like Docling and Marker to parse complex layouts—including OCR-enhanced PDFs—and applies format-aware chunking enriched with metadata and context summaries. This chunk contextualization, inspired by Anthropic’s approach, significantly boosts retrieval relevance.
Hybrid and Contextual Retrieval

OpenRAG blends semantic search and BM25 with optional query reformulation to improve results from vague or underspecified queries. Retrieval can also be enhanced with HyDE, which generates a hypothetical answer and uses it to guide document selection.
Multilingual Reranking with Infinity

OpenRAG can optionally rerank with GTE or Jina v2 using Infinity Inference Server to filter top candidates by semantic relevance across languages and formats.
LLM Integration and API Compatibility

LLM Integration and API Compatibility
OpenRAG is LLM-agnostic, supporting Mistral, GPT-4, Claude, and more for chat-based interactions. Its fully OpenAI-compatible chat API enables seamless integration with existing infrastructure and tools like LangChain, OpenWebUI, or N8N — no changes required.
Automated Evaluation Pipelines

Built-in clustering (via UMAP + HDBScan) lets OpenRAG auto-generate synthetic QA datasets from your indexed documents. A local LLM scores each query-chunk pair to help you tune the retrieval strategy for precision, recall, and coverage — before deploying to production.

Building Trustworthy Enterprise RAG with Open Source Power

In this in-depth webinar, our team walks you through the process of building and deploying Retrieval-Augmented Generation (RAG) applications using OpenRAG.
Whether you're transitioning from prototypes to real-world production environments, concerned about hallucinations, or facing scalability challenges, this session is designed to help you navigate these complexities.

Let's Build Together

Want to integrate RAG in your business? Whether you're building an AI assistant, legal search engine, or multimodal enterprise Q&A, we can help you get there — fast, and with sovereignty.

Contact Us