Skip to main content
Version: 2.4.5 (Latest)

Concept: rag-connectors

The framework's connector layer is the boundary between the pipeline and the outside world: embedding providers, vector stores, LLM APIs, document loaders. Every connector implements a plugin contract defined as a JSON schema in contracts/.

What's in rag-connectors

Embedders

ConnectorNotes
OpenAIEmbedderCalls OpenAI's embeddings API. Configurable model (text-embedding-3-small, -3-large, ada-002). Batch + retry built in.
CohereEmbedderCalls Cohere's embed API. Supports the embed-multilingual-v3.0 family.
LocalTfidfEmbedderPure-JS TF-IDF for development, tests, and small corpora. No network.
Custom (your code)Implement the embedder contract — one async embed(texts: string[]) method returning number[][].

Retrievers / Vector stores

ConnectorNotes
MemoryRetrieverIn-process cosine-similarity store. Honors tenantId for multi-tenant scoping. Production-suitable for small corpora (under 100k chunks).
Pinecone, Weaviate, Qdrant, pgvectorBuild your own retriever using the documented contract — examples in Examples.

LLMs

ConnectorNotes
OpenAIConnectorOpenAI Chat Completions + streaming. Cost tracking integrated.
AnthropicConnectorAnthropic Messages API + streaming.
OllamaConnectorLocal Ollama server for offline development and self-hosted production.
CustomImplement the llm contract: one generate({ prompt, context, options }) and an optional stream(...).

Loaders

ConnectorNotes
Document loaders for PDF, DOCX, HTML, Markdown, plain textConfigurable per-format.
CustomImplement the loader contract: load(source) → Document[].

Rerankers

ConnectorNotes
BM25RerankerLexical reranking on top of vector retrieval.
EmbeddingRerankerRe-scores using a different embedding model than retrieval.
LLMRerankerLLM-judged relevance scoring. Slowest and highest quality.
CascadedRerankerChains the above (cheap first, expensive last) for cost-effective reranking.

Why a connector layer

Three reasons:

  1. Vendor independence. You can swap OpenAI for Anthropic without touching pipeline code. The pipeline doesn't know which LLM it's calling.
  2. Contract enforcement. Every connector must satisfy a JSON schema in contracts/. Plugins that don't register fail loudly, not silently.
  3. Isolated security review. When you add a new provider, the review surface is one file implementing one contract — not a pipeline-wide refactor.

Adding a connector

import { pluginRegistry } from "@devilsdev/rag-pipeline-utils";

const myEmbedder = {
name: "vendor-x-embedder",
type: "embedder",
version: "1.0.0",
async embed(texts) {
/* call vendor X */
return embeddings;
},
};

pluginRegistry.register(myEmbedder);
// Now usable in createRagPipeline({ embedder: "vendor-x-embedder" })

The registry validates the plugin against contracts/embedder-contract.json on registration. See Plugins for the full guide.

Stability

Connector classes shipped in the package are part of the public API surface and follow the SEMVER policy.

Connector contracts in contracts/ are also stable. Adding new optional fields is non-breaking; removing or renaming required fields is breaking and ships only in major versions.

  • Plugins — full plugin development guide
  • Architecture — internal plugin registry design
  • Examples — recipe for a custom Pinecone retriever, custom embedder, etc.