Skip to main content
Version: 2.4.5 (Latest)

Concept: rag-guardrails

If rag-eval tells you whether an answer is trustworthy, rag-guardrails is what stops bad inputs from becoming bad outputs in the first place. It's the framework's safety layer — applied at three points in the pipeline.

The three layers

User query


┌──────────────┐
│ pre-retrieval│ Prompt injection? Off-topic? Too long?
│ guardrails │ → reject before any embedding or LLM spend
└──────┬───────┘

┌──────────────┐
│ retrieval │ Hybrid vector + BM25 with RRF
└──────┬───────┘

┌──────────────┐
│ retrieval- │ Min relevance score? Tenant ACL?
│ time filter │ → reject if no chunk meets the bar
└──────┬───────┘

┌──────────────┐
│ generation │ LLM produces the answer
└──────┬───────┘

┌──────────────┐
│ post-gen │ PII in output? Grounded in citations?
│ guardrails │ → redact, refuse, or pass with warning
└──────┬───────┘

Response

Each layer is independently configurable. Skip a layer and the others still run.

What's in rag-guardrails

SurfacePurpose
GuardrailsPipelineWraps a Pipeline and adds the three guardrail layers.
Pre-retrieval: injection detectionPattern-matching + heuristic detection of jailbreak / prompt-injection attempts.
Pre-retrieval: topic filterCosine-similarity gate against a configurable allowed-topics embedding set.
Pre-retrieval: query-length limitHard limit before embedding spend.
Retrieval-time: minimum relevanceDrop chunks below threshold.
Retrieval-time: ACL filterHonor tenantId / aclTags from the retrieval options.
Retrieval-time: max context sizeToken cap.
Post-generation: PII detectionRegex + named-entity heuristics for emails, phones, SSNs, credit cards.
Post-generation: groundedness checkReject answers below a groundedness threshold.
Post-generation: classifier hookPluggable toxicity / safety classifier slot.

Wiring it up

import {
createRagPipeline,
GuardrailsPipeline,
MemoryRetriever,
OpenAIConnector,
} from "@devilsdev/rag-pipeline-utils";

const base = createRagPipeline({
retriever: new MemoryRetriever(),
llm: new OpenAIConnector({ apiKey: process.env.OPENAI_API_KEY }),
});

const safe = new GuardrailsPipeline(base, {
preRetrieval: {
enableInjectionDetection: true,
maxQueryLength: 1000,
topicAllowlist: { embeddings: businessTopicEmbeddings, threshold: 0.6 },
},
retrieval: {
minRelevanceScore: 0.5,
maxContextTokens: 4000,
},
postGeneration: {
enablePIIDetection: true,
minGroundedness: 0.7,
},
});

const result = await safe.run({ query: userQuestion });
// result.blocked = true if any guardrail rejected
// result.blockReason = "pre_retrieval.injection_detected" etc.

When to enable which layer

ScenarioPre-retrievalRetrieval-timePost-gen
Internal employee tool, trusted usersoptionalrecommended (relevance)optional
Customer-facing chatbotrequiredrequiredrequired
Public API, untrusted callersrequired + rate limitrequiredrequired + classifier
Compliance-regulated (HIPAA, GDPR, PCI)requiredrequired + ACLrequired + audit log

What guardrails don't fix

Guardrails reduce a class of risks; they don't eliminate them. See the Security Capabilities matrix for the detailed scorecard. Headlines:

  • Guardrails are defense in depth, not a single point of safety.
  • Pattern-based injection detection catches known patterns; novel attacks may slip through. Combine with output-side checks.
  • PII detection uses heuristics; high-stakes compliance needs a dedicated classifier and human review.
  • Groundedness scoring depends on the LLM judge; it's a confidence signal, not a guarantee.

Stability

GuardrailsPipeline, the configuration shape, and the documented guardrail layers are part of the public API and follow the SEMVER policy.

Detection patterns and judge prompts may improve in patch releases without bumping the major. If you require stable detection behavior, pin the package version.