Kubernetes Deployment Guide
Deploy RAG Pipeline Utils on Kubernetes with production-grade configurations, Helm charts, auto-scaling, and comprehensive monitoring.
Overview
This guide covers:
- Kubernetes manifests for production deployment
- Helm charts for templated deployments
- ConfigMaps and Secrets management
- Horizontal Pod Autoscaling (HPA)
- Service mesh integration
- Monitoring with Prometheus and Grafana
- Rolling updates and rollback strategies
Prerequisites
- Kubernetes cluster 1.24+
- kubectl configured
- Helm 3.0+
- Basic Kubernetes knowledge
- Container registry access
Quick Start
Basic Deployment
Create Kubernetes deployment manifest:
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: rag-pipeline
namespace: production
labels:
app: rag-pipeline
version: v2.3.1
spec:
replicas: 3
selector:
matchLabels:
app: rag-pipeline
template:
metadata:
labels:
app: rag-pipeline
version: v2.3.1
spec:
serviceAccountName: rag-pipeline
securityContext:
runAsNonRoot: true
runAsUser: 1001
fsGroup: 1001
containers:
- name: rag-app
image: your-registry.com/rag-pipeline-utils:2.3.1
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 3000
protocol: TCP
- name: metrics
containerPort: 9090
protocol: TCP
env:
- name: NODE_ENV
value: "production"
- name: LOG_LEVEL
value: "info"
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: rag-secrets
key: openai-api-key
- name: VECTOR_DB_URL
valueFrom:
configMapKeyRef:
name: rag-config
key: vector-db-url
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
volumeMounts:
- name: config
mountPath: /app/config
readOnly: true
- name: data
mountPath: /app/data
volumes:
- name: config
configMap:
name: rag-config
- name: data
persistentVolumeClaim:
claimName: rag-data-pvc
Deploy:
kubectl apply -f k8s/deployment.yaml
Service Configuration
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: rag-pipeline
namespace: production
labels:
app: rag-pipeline
spec:
type: ClusterIP
selector:
app: rag-pipeline
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
- name: metrics
port: 9090
targetPort: metrics
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: rag-pipeline-headless
namespace: production
spec:
clusterIP: None
selector:
app: rag-pipeline
ports:
- name: http
port: 3000
targetPort: http
ConfigMaps and Secrets
ConfigMap
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: rag-config
namespace: production
data:
vector-db-url: "http://qdrant:6333"
redis-url: "redis://redis:6379"
log-level: "info"
max-concurrent-requests: "100"
embedding-batch-size: "100"
# Application configuration
app-config.json: |
{
"pipeline": {
"timeout": 30000,
"retryAttempts": 3,
"retryDelay": 1000
},
"cache": {
"enabled": true,
"ttl": 3600
},
"rateLimit": {
"windowMs": 900000,
"maxRequests": 100
}
}
Secrets
# k8s/secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: rag-secrets
namespace: production
type: Opaque
stringData:
openai-api-key: "sk-..."
pinecone-api-key: "..."
jwt-secret: "your-jwt-secret"
redis-password: "strong-password"
Create secrets securely:
# Create from file
kubectl create secret generic rag-secrets \
--from-file=openai-api-key=./secrets/openai.key \
--from-file=jwt-secret=./secrets/jwt.secret \
--namespace=production
# Or use sealed secrets (recommended)
kubeseal --format=yaml < secrets.yaml > sealed-secrets.yaml
kubectl apply -f sealed-secrets.yaml
Helm Chart
Chart Structure
helm/rag-pipeline/
├── Chart.yaml
├── values.yaml
├── values-production.yaml
├── templates/
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── configmap.yaml
│ ├── secrets.yaml
│ ├── hpa.yaml
│ ├── ingress.yaml
│ ├── serviceaccount.yaml
│ └── NOTES.txt
Chart.yaml
# helm/rag-pipeline/Chart.yaml
apiVersion: v2
name: rag-pipeline
description: RAG Pipeline Utils Helm Chart
version: 2.3.1
appVersion: "2.3.1"
keywords:
- rag
- ai
- llm
- embeddings
maintainers:
- name: Ali Kahwaji
email: ali@example.com
values.yaml
# helm/rag-pipeline/values.yaml
replicaCount: 3
image:
repository: your-registry.com/rag-pipeline-utils
tag: "2.3.1"
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 80
targetPort: 3000
annotations: {}
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
hosts:
- host: rag.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: rag-tls
hosts:
- rag.example.com
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
config:
nodeEnv: production
logLevel: info
vectorDbUrl: "http://qdrant:6333"
redisUrl: "redis://redis:6379"
secrets:
openaiApiKey: ""
pineconeApiKey: ""
jwtSecret: ""
persistence:
enabled: true
storageClass: "standard"
accessMode: ReadWriteOnce
size: 10Gi
monitoring:
enabled: true
serviceMonitor:
enabled: true
interval: 30s
securityContext:
runAsNonRoot: true
runAsUser: 1001
fsGroup: 1001
capabilities:
drop:
- ALL
Deployment Template
# helm/rag-pipeline/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "rag-pipeline.fullname" . }}
labels:
{{- include "rag-pipeline.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "rag-pipeline.selectorLabels" . | nindent 6 }}
template:
metadata:
annotations:
checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
checksum/secret: {{ include (print $.Template.BasePath "/secrets.yaml") . | sha256sum }}
labels:
{{- include "rag-pipeline.selectorLabels" . | nindent 8 }}
spec:
serviceAccountName: {{ include "rag-pipeline.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.securityContext | nindent 8 }}
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: 3000
protocol: TCP
- name: metrics
containerPort: 9090
protocol: TCP
env:
- name: NODE_ENV
value: {{ .Values.config.nodeEnv }}
- name: LOG_LEVEL
value: {{ .Values.config.logLevel }}
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: {{ include "rag-pipeline.fullname" . }}
key: openai-api-key
- name: VECTOR_DB_URL
value: {{ .Values.config.vectorDbUrl }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 10
periodSeconds: 5
volumeMounts:
- name: config
mountPath: /app/config
readOnly: true
{{- if .Values.persistence.enabled }}
- name: data
mountPath: /app/data
{{- end }}
volumes:
- name: config
configMap:
name: {{ include "rag-pipeline.fullname" . }}
{{- if .Values.persistence.enabled }}
- name: data
persistentVolumeClaim:
claimName: {{ include "rag-pipeline.fullname" . }}
{{- end }}
Horizontal Pod Autoscaling
# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: rag-pipeline-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: rag-pipeline
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 2
periodSeconds: 30
selectPolicy: Max
Ingress Configuration
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: rag-pipeline
namespace: production
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
spec:
ingressClassName: nginx
tls:
- hosts:
- rag.example.com
secretName: rag-tls
rules:
- host: rag.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: rag-pipeline
port:
number: 80
Monitoring Stack
ServiceMonitor (Prometheus Operator)
# k8s/servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: rag-pipeline
namespace: production
labels:
app: rag-pipeline
spec:
selector:
matchLabels:
app: rag-pipeline
endpoints:
- port: metrics
interval: 30s
path: /metrics
scheme: http
Grafana Dashboard
# k8s/grafana-dashboard.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: rag-pipeline-dashboard
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
rag-pipeline.json: |
{
"dashboard": {
"title": "RAG Pipeline Metrics",
"panels": [
{
"title": "Request Rate",
"targets": [
{
"expr": "rate(http_requests_total[5m])"
}
]
},
{
"title": "P95 Latency",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))"
}
]
}
]
}
}
Deployment Strategies
Rolling Update
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
Blue-Green Deployment
# Blue deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: rag-pipeline-blue
spec:
replicas: 3
selector:
matchLabels:
app: rag-pipeline
version: blue
template:
metadata:
labels:
app: rag-pipeline
version: blue
spec:
containers:
- name: rag-app
image: rag-pipeline:2.3.0
---
# Green deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: rag-pipeline-green
spec:
replicas: 3
selector:
matchLabels:
app: rag-pipeline
version: green
template:
metadata:
labels:
app: rag-pipeline
version: green
spec:
containers:
- name: rag-app
image: rag-pipeline:2.3.1
---
# Service switches between blue and green
apiVersion: v1
kind: Service
metadata:
name: rag-pipeline
spec:
selector:
app: rag-pipeline
version: green # Switch to blue for rollback
Helm Deployment Commands
Install
# Add custom values
helm install rag-pipeline ./helm/rag-pipeline \
--namespace production \
--create-namespace \
--values values-production.yaml \
--set image.tag=2.3.1 \
--set secrets.openaiApiKey=$OPENAI_API_KEY
# Dry run first
helm install rag-pipeline ./helm/rag-pipeline \
--namespace production \
--dry-run --debug
Upgrade
# Upgrade with new values
helm upgrade rag-pipeline ./helm/rag-pipeline \
--namespace production \
--values values-production.yaml \
--set image.tag=2.3.2 \
--wait --timeout 5m
# Check rollout status
kubectl rollout status deployment/rag-pipeline -n production
Rollback
# List releases
helm history rag-pipeline -n production
# Rollback to previous version
helm rollback rag-pipeline -n production
# Rollback to specific revision
helm rollback rag-pipeline 3 -n production
Production Best Practices
Resource Quotas
# k8s/resourcequota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "100"
requests.memory: 200Gi
limits.cpu: "200"
limits.memory: 400Gi
persistentvolumeclaims: "10"
Pod Disruption Budget
# k8s/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: rag-pipeline-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: rag-pipeline
Network Policy
# k8s/networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: rag-pipeline-netpol
namespace: production
spec:
podSelector:
matchLabels:
app: rag-pipeline
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: production
ports:
- protocol: TCP
port: 3000
egress:
- to:
- namespaceSelector:
matchLabels:
name: production
ports:
- protocol: TCP
port: 6333 # Qdrant
- protocol: TCP
port: 6379 # Redis
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443 # External APIs
Troubleshooting
Pod Issues
# Check pod status
kubectl get pods -n production -l app=rag-pipeline
# View pod logs
kubectl logs -f deployment/rag-pipeline -n production
# Describe pod
kubectl describe pod <pod-name> -n production
# Execute commands in pod
kubectl exec -it <pod-name> -n production -- /bin/sh
Service Issues
# Check service endpoints
kubectl get endpoints rag-pipeline -n production
# Test service connectivity
kubectl run curl --image=curlimages/curl -it --rm -- \
curl http://rag-pipeline.production.svc.cluster.local/health
HPA Issues
# Check HPA status
kubectl get hpa -n production
# Describe HPA
kubectl describe hpa rag-pipeline-hpa -n production
# Check metrics server
kubectl top pods -n production
Security Hardening
Pod Security Standards
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
Service Account
# k8s/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: rag-pipeline
namespace: production
automountServiceAccountToken: false
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: rag-pipeline
namespace: production
rules:
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: rag-pipeline
namespace: production
subjects:
- kind: ServiceAccount
name: rag-pipeline
roleRef:
kind: Role
name: rag-pipeline
apiGroup: rbac.authorization.k8s.io
Next Steps
- AWS EKS Deployment - Deploy on AWS Elastic Kubernetes Service
- Azure AKS Deployment - Deploy on Azure Kubernetes Service
- GCP GKE Deployment - Deploy on Google Kubernetes Engine
- Monitoring Guide - Comprehensive monitoring setup