LLM Incident Copilot
Evidence-grounded RAG for production log debugging.
Problem
Production incident debugging requires reading through multi-megabyte log files to find root causes. Existing AI tools generate plausible-sounding diagnoses but hallucinate details — fabricating error messages, timestamps, and causal chains that don't appear in the actual logs. In high-stakes incidents, a wrong diagnosis wastes time and can make the situation worse.
Approach
The LLM Incident Copilot is a RAG system with a strict evidence requirement: every AI conclusion must cite actual log lines with timestamps. Logs are chunked, embedded, and stored in a dual vector store (FAISS for local speed, Qdrant for persistence). At query time, relevant chunks are retrieved, re-ranked, and passed to the LLM with instructions to only make claims supported by the provided evidence. An evidence guardrail layer validates that citations correspond to real log entries.
┌─────────────────────────────────────────────────────────┐
│ User Query │
│ "Why did the payment service crash?" │
└──────────────────────┬──────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────┐
│ Query Engine │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Dual Vector Store Retrieval │ │
│ │ ┌──────────┐ ┌──────────────┐ │ │
│ │ │ FAISS │ │ Qdrant │ │ │
│ │ │ (local) │ │ (persistent)│ │ │
│ │ └────┬─────┘ └──────┬───────┘ │ │
│ │ └──────────┬────────────┘ │ │
│ │ Merged + Re-ranked │ │
│ └──────────────────┬──────────────────────────────┘ │
│ │ │
│ ┌──────────────────▼──────────────────────────────┐ │
│ │ LLM Inference Layer │ │
│ │ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Ollama │ │ Groq │ │ │
│ │ │ (local) │ │ (cloud) │ │ │
│ │ └──────────┘ └──────────┘ │ │
│ └──────────────────┬──────────────────────────────┘ │
│ │ │
│ ┌──────────────────▼──────────────────────────────┐ │
│ │ Evidence Guardrails │ │
│ │ Every claim must cite log line + timestamp │ │
│ │ Unsupported claims are flagged or removed │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘How it works
Log Ingestion & Chunking
Production logs (multi-MB files) are ingested and split into chunks preserving temporal locality — each chunk retains its timestamp range and service context. Chunks are embedded using sentence-transformers and stored in both FAISS (for fast local retrieval) and Qdrant (for persistent, filtered queries).
Dual Vector Store
FAISS provides sub-millisecond local retrieval for interactive debugging sessions. Qdrant adds persistent storage with metadata filtering (by service, severity, time range). At query time, results from both stores are merged and re-ranked by relevance.
Evidence Guardrails
The core differentiator: the LLM is prompted with strict citation requirements. Every diagnostic claim must reference a specific log line with its timestamp. The guardrail layer post-processes the LLM output and validates that cited log lines actually exist in the retrieved context. Claims without valid citations are flagged.
Dual Inference Backends
Supports both local inference via Ollama (for air-gapped or privacy-sensitive environments) and cloud inference via Groq (for faster responses with larger models). The backend is swappable without changing the retrieval pipeline.
Metrics
Tech stack
Core
Vector Stores
LLM Inference
Frontend
Infrastructure
Lessons learned
The evidence guardrail is the entire value proposition — without it, this is just another chatbot over logs. The hardest part was making the citation check fast enough to not degrade the user experience. Chunking strategy matters enormously: too small and you lose context, too large and retrieval precision drops. I ended up with a time-window-based chunking strategy that preserves temporal locality.
Timeline
Built 2025–2026. Designed as a response to hallucination problems in production debugging tools. Open source.