RunbookHermes is a specialized AIOps agent for incident response that has gained 531 stars and 29 forks on GitHub since its creation on May 1, 2026. Built by developer Tommy-yw on the Hermes Agent runtime, the system handles payment system incidents through evidence collection, root-cause analysis, approval-gated remediation, and runbook learning.
Evidence-Driven Architecture Distinguishes Incident Response Approach
RunbookHermes organizes incident data through an 'EvidenceStack' context engine that structures information into alert summaries, key evidence, hypotheses, action plans, and final answers. The system collects data from Prometheus metrics, Loki logs, Jaeger traces, deployment records, and service profiles, avoiding the common practice of flooding AI prompts with raw observability data.
The system accepts incidents through multiple channels including web console, Alertmanager webhooks, Feishu/WeCom messaging, and API endpoints. All inputs normalize into a unified incident command workflow that preserves context across investigation stages.
Safety-First Remediation Prevents Blind Automation
High-risk actions require action policy checks, approval requests, checkpoint creation, dry-run execution, controlled rollout, recovery verification, and audit logging. This multi-stage safety system prevents destructive operations from executing without human oversight.
Optional OpenAI-compatible integration provides readable incident summaries and operator-facing explanations while keeping evidence chains explicit and inspectable. The IncidentMemory provider maintains service profiles, team preferences, incident summaries, recurring causes, runbook skills, and approval requirements—avoiding simple chat-history approaches that lose operational context.
Hermes Runtime Adaptation Enables Production Deployment
The system preserves Hermes' agent runtime loop, provider routing, tool system, memory providers, context engine, skills framework, gateway architecture, and safety boundaries while adapting them for incident response workflows. Built in Python, RunbookHermes supports three deployment modes: Web/API only for UI exploration, local reference payment environment with full observability stack, or production-oriented microservices architecture.
Integration support includes Prometheus, Loki, Jaeger/Tempo, Kubernetes, Argo CD-style adapters, and Chinese messaging platforms Feishu and WeCom for team communication. The system converts successful incident responses into reusable runbook skills, enabling organizational learning from past incidents.
Repository Includes Reference Payment Environment
The GitHub repository provides comprehensive documentation with directories for agent runtime, runbook-specific profiles and plugins, FastAPI web/API services, observability integrations, skill definitions, a payment system reference environment, and deployment guides. The reference environment enables developers to test incident response workflows without production system access.
The project distinguishes itself through evidence-driven root-cause analysis rather than purely model-based guessing, approval workflows that prevent rogue automation, and the ability to institutionalize incident response knowledge through the runbook skills framework.
Key Takeaways
- RunbookHermes gained 531 stars on GitHub since its May 1, 2026 launch as an AIOps agent for incident response built on Hermes runtime
- The EvidenceStack context engine organizes observability data into structured alert summaries, evidence, hypotheses, and action plans rather than raw data dumps
- Safety-first remediation requires action policy checks, approval requests, dry-run execution, and audit logging before executing high-risk operations
- IncidentMemory provider maintains service profiles, recurring causes, and runbook skills to preserve organizational knowledge beyond chat history
- The system supports deployment in three modes: Web/API only, local reference environment with full observability stack, or production microservices architecture