Add a separate llm_chat client so chat responses use a smarter model
(gpt-4o-mini) while analysis stays on the cheap local Qwen3-8B.
Falls back to llm_heavy if LLM_CHAT_MODEL is not set.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Triage analysis runs on Qwen 8B (athena.lan) for free first-pass.
Escalation, chat, image roasts, and commands use GPT-4o via OpenAI.
Each tier gets its own base URL, API key, and concurrency settings.
Local models get /no_think and serialized requests automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Default models: gpt-4o-mini (triage), gpt-4o (escalation)
- Remove Qwen-specific /no_think hacks
- Reduce timeout from 600s to 120s, increase concurrency semaphore to 4
- Support empty LLM_BASE_URL to use OpenAI directly
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a BotSettings key-value table. The active mode is saved
when changed via /bcs-mode and restored on startup.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a server-wide mode system with /bcs-mode command.
- Default: current hall-monitor behavior unchanged
- Chatty: friendly chat participant with proactive replies (~10% chance)
- Roast: savage roast mode with proactive replies
- Chatty/roast use relaxed moderation thresholds
- 5-message cooldown between proactive replies per channel
- Bot status updates to reflect active mode
- /bcs-status shows current mode and effective thresholds
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Log every LLM call (analysis, chat, image, raw_analyze) to a new
LlmLog table with request type, model, token counts, duration,
success/failure, and truncated request/response payloads. Enables
debugging prompt issues and tracking usage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Triage model (LLM_MODEL) handles every message cheaply. If toxicity
>= 0.25, off_topic, or coherence < 0.6, the message is re-analyzed
with the heavy model (LLM_ESCALATION_MODEL). Chat, image analysis,
/bcs-test, and /bcs-scan always use the heavy model.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sends a minimal 1-token completion during setup_hook so the model is
ready before Discord messages start arriving, avoiding connection
errors and slow first responses after a restart.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Serialize all LLM requests through an asyncio semaphore to prevent
overloading athena with concurrent requests
- Switch chat() to streaming so the typing indicator only appears once
the model starts generating (not during thinking/loading)
- Increase LLM timeout from 5 to 10 minutes for slow first loads
- Rename ollama_client.py to llm_client.py and self.ollama to self.llm
since the bot uses a generic OpenAI-compatible API
- Update embed labels from "Ollama" to "LLM"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Discord bot for monitoring chat sentiment and tracking drama using
Ollama LLM on athena.lan. Includes sentiment analysis, slash commands,
drama tracking, and SQL Server persistence via Docker Compose.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>