LINAGORA Logo OpenRAG

Sovereign, Open Source Retrieval-Augmented Generation

Built by LINAGORA, OpenRAG is a modular framework to explore Retrieval-Augmented Generation (RAG) techniques. Designed for experimentation and transparency, it empowers teams to develop state-of-the-art document-grounded AI systems — with full control of their stack.

“Linagora also researched the best embedder-reranker pair. Given the data it intended to feed its prototype, it calculated metrics from the SciFact dataset. It determined that the most appropriate approach was to pair KaLM-mini-instruct with GTE... or Jina v2, which offers the best performance/latency compromise.”

Silicon.fr, July 2025

Real-World Use Cases

Discover what you can build with OpenRAG — from smart assistants to enterprise automation.

Understand RAG in 5 Minutes

Watch this concise video overview to understand what RAG is and how OpenRAG lets you deploy your own AI assistant on your private documents, images, audio, and more.

Key Features

Open Source & Sovereign

AGPL-licensed, auditable, and community-driven.

LLM-Agnostic

Connect your own model (Mistral, Claude, GPT, etc.) or use a hosted provider.

Vector Search

With Milvus, segment your knowledge base per user or team.

Multimodal Parsing

Audio transcription, image captioning, PDF layout awareness.

Scalable with Ray

Process, embed, and rerank at cluster scale using distributed tasks.

Modern UIs

Web-based indexer, FastAPI, Chainlit chat, and OpenAI-compatible API.

Built on Robust Open Source Foundations

OpenRAG is a modern and extensible Retrieval-Augmented Generation stack that combines best-in-class tools for inference, retrieval, and chunk processing. It's designed for experimentation, with sovereignty and transparency at its core.

Distributed Indexing with Ray

OpenRAG relies on Ray to parallelize ingestion, chunking, and vector embedding across multiple CPU and GPU resources. This allows fast, scalable processing of large and diverse document sets, whether you're working with audio, PDFs, or scanned images.

Smart Chunking and Layout-Aware Parsing

With advanced loaders like Docling and Marker, OpenRAG can parse complex document layouts, including OCR-enhanced PDFs. It supports format-aware chunking, adding metadata, context summaries, and original document titles to each segment for better retrieval quality.

Hybrid and Contextual Retrieval

By combining semantic search with BM25 keyword matching and Anthropic-style contextual chunking, OpenRAG ensures more relevant results even when user queries are vague or under-specified. You can also fine-tune chunk preselection with HyDe: an approach that lets the LLM guide retrieval by answering first from parametric knowledge, then rerunning retrieval based on that draft response.

Multilingual Reranking with Infinity

To improve answer accuracy, OpenRAG includes an optional reranking phase using GTE or Jina v2 models hosted on the Infinity inference server. This phase filters the top candidates based on semantic relevance to the user's intent, even across languages or formats.

LLM Integration and API Compatibility

OpenRAG speaks the OpenAI API language. Whether you're using Mistral, GPT-4, or Claude, you can interact through a familiar API surface — or plug into frontend tools like LangChain, OpenWebUI, or N8N with zero adaptation.

Automated Evaluation Pipelines

Built-in clustering (via UMAP + HDBScan) lets OpenRAG auto-generate synthetic QA datasets from your indexed documents. A local LLM scores each query-chunk pair to help you tune the retrieval strategy for precision, recall, and coverage — before deploying to production.

How It Works

RAG architecture diagram

A RAG (Retrieval-Augmented Generation) system works by first retrieving relevant documents from a knowledge base based on the user's query, then using a language model to generate a precise, context-aware response grounded in that retrieved information.

OpenRAG Webinar: “Building Trustworthy Enterprise RAG with Open Source Power”

In this in-depth webinar, our team walks you through the process of building and deploying Retrieval-Augmented Generation (RAG) applications using OpenRAG.
Whether you're transitioning from prototypes to real-world production environments, concerned about hallucinations, or facing scalability challenges, this session is designed to help you navigate these complexities.

Ondine, the OpenRAG Mascot

Let's Build Together

Want to integrate RAG in your business? Whether you're building an AI assistant, legal search engine, or multimodal enterprise Q&A, we can help you get there — fast, and with sovereignty.

Contact LINAGORA