Individual model probe failures are now tracked separately from router
health failures. Models get a grace period after loading, and persistently
failing models are unloaded and put in cooldown rather than triggering a
full service restart. Only router-level health failures cause restarts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Monitors llama-server health with multi-phase checks (zombie detection,
health endpoint, loaded model probing) and auto-restarts via systemd or
manual relaunch on consecutive failures.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>