LLM Incident Copilot

Evidence-grounded RAG for production log debugging.

Released — open sourceGitHub

Problem

Production incident debugging requires reading through multi-megabyte log files to find root causes. Existing AI tools generate plausible-sounding diagnoses but hallucinate details — fabricating error messages, timestamps, and causal chains that don't appear in the actual logs. In high-stakes incidents, a wrong diagnosis wastes time and can make the situation worse.

Approach

The LLM Incident Copilot is a RAG system with a strict evidence requirement: every AI conclusion must cite actual log lines with timestamps. Logs are chunked, embedded, and stored in a dual vector store (FAISS for local speed, Qdrant for persistence). At query time, relevant chunks are retrieved, re-ranked, and passed to the LLM with instructions to only make claims supported by the provided evidence. An evidence guardrail layer validates that citations correspond to real log entries.

┌─────────────────────────────────────────────────────────┐
│                     User Query                          │
│         "Why did the payment service crash?"            │
└──────────────────────┬──────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────┐
│                  Query Engine                            │
│  ┌─────────────────────────────────────────────────┐    │
│  │         Dual Vector Store Retrieval             │    │
│  │  ┌──────────┐          ┌──────────────┐         │    │
│  │  │  FAISS   │          │   Qdrant     │         │    │
│  │  │  (local) │          │  (persistent)│         │    │
│  │  └────┬─────┘          └──────┬───────┘         │    │
│  │       └──────────┬────────────┘                 │    │
│  │            Merged + Re-ranked                   │    │
│  └──────────────────┬──────────────────────────────┘    │
│                     │                                   │
│  ┌──────────────────▼──────────────────────────────┐    │
│  │           LLM Inference Layer                   │    │
│  │    ┌──────────┐      ┌──────────┐               │    │
│  │    │  Ollama  │      │   Groq   │               │    │
│  │    │ (local)  │      │ (cloud)  │               │    │
│  │    └──────────┘      └──────────┘               │    │
│  └──────────────────┬──────────────────────────────┘    │
│                     │                                   │
│  ┌──────────────────▼──────────────────────────────┐    │
│  │       Evidence Guardrails                       │    │
│  │  Every claim must cite log line + timestamp     │    │
│  │  Unsupported claims are flagged or removed      │    │
│  └─────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘

How it works

Log Ingestion & Chunking

Production logs (multi-MB files) are ingested and split into chunks preserving temporal locality — each chunk retains its timestamp range and service context. Chunks are embedded using sentence-transformers and stored in both FAISS (for fast local retrieval) and Qdrant (for persistent, filtered queries).

Dual Vector Store

FAISS provides sub-millisecond local retrieval for interactive debugging sessions. Qdrant adds persistent storage with metadata filtering (by service, severity, time range). At query time, results from both stores are merged and re-ranked by relevance.

Evidence Guardrails

The core differentiator: the LLM is prompted with strict citation requirements. Every diagnostic claim must reference a specific log line with its timestamp. The guardrail layer post-processes the LLM output and validates that cited log lines actually exist in the retrieved context. Claims without valid citations are flagged.

Dual Inference Backends

Supports both local inference via Ollama (for air-gapped or privacy-sensitive environments) and cloud inference via Groq (for faster responses with larger models). The backend is swappable without changing the retrieval pipeline.

Metrics

100%

Citation Rate

Vector Stores

<3s

Query Latency

Inference Backends

Tech stack

Core

PythonFastAPILangChain

Vector Stores

FAISSQdrant

LLM Inference

Ollama (local)Groq (cloud)

Frontend

ReactTailwind CSS

Infrastructure

DockerDocker Compose

Lessons learned

The evidence guardrail is the entire value proposition — without it, this is just another chatbot over logs. The hardest part was making the citation check fast enough to not degrade the user experience. Chunking strategy matters enormously: too small and you lose context, too large and retrieval precision drops. I ended up with a time-window-based chunking strategy that preserves temporal locality.

Timeline

Built 2025–2026. Designed as a response to hallucination problems in production debugging tools. Open source.