llamacpp-watchdog

2 Commits 1 Branch 0 Tags

Author	SHA1	Message	Date
aj	174db1e5db	feat: per-model failure tracking to avoid unnecessary full restarts Individual model probe failures are now tracked separately from router health failures. Models get a grace period after loading, and persistently failing models are unloaded and put in cooldown rather than triggering a full service restart. Only router-level health failures cause restarts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 12:09:07 -05:00
aj	321a43ac81	Initial commit: llama.cpp watchdog service Monitors llama-server health with multi-phase checks (zombie detection, health endpoint, loaded model probing) and auto-restarts via systemd or manual relaunch on consecutive failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 22:47:12 -05:00

Author

SHA1

Message

Date

aj

174db1e5db

feat: per-model failure tracking to avoid unnecessary full restarts

Individual model probe failures are now tracked separately from router
health failures. Models get a grace period after loading, and persistently
failing models are unloaded and put in cooldown rather than triggering a
full service restart. Only router-level health failures cause restarts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-08 12:09:07 -05:00

aj

321a43ac81

Initial commit: llama.cpp watchdog service

Monitors llama-server health with multi-phase checks (zombie detection,
health endpoint, loaded model probing) and auto-restarts via systemd or
manual relaunch on consecutive failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-06 22:47:12 -05:00