Breehavior-Monitor

Author	SHA1	Message	Date
AJ Isaacs	534aac5cd7	Enable thinking for chat, diversify roast styles - Remove /no_think override from chat() so Qwen3 reasons before generating responses (fixes incoherent word-salad replies) - Analysis and image calls keep /no_think for speed - Add varied roast style guidance (deadpan, sarcastic, blunt, etc.) - Explicitly ban metaphors/similes in roast prompt - Replace metaphor examples with direct roast examples Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 13:59:16 -05:00
AJ Isaacs	8a06ddbd6e	Support hybrid LLM: local Qwen triage + OpenAI escalation Triage analysis runs on Qwen 8B (athena.lan) for free first-pass. Escalation, chat, image roasts, and commands use GPT-4o via OpenAI. Each tier gets its own base URL, API key, and concurrency settings. Local models get /no_think and serialized requests automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 12:20:07 -05:00
AJ Isaacs	28fb66d5f9	Switch LLM backend from llama.cpp/Qwen to OpenAI - Default models: gpt-4o-mini (triage), gpt-4o (escalation) - Remove Qwen-specific /no_think hacks - Reduce timeout from 600s to 120s, increase concurrency semaphore to 4 - Support empty LLM_BASE_URL to use OpenAI directly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 12:07:53 -05:00
AJ Isaacs	86aacfb84f	Add 120s timeout to image analysis streaming The vision model request was hanging indefinitely, freezing the bot. The streaming loop had no timeout so if the model never returned chunks, the bot would wait forever. Now times out after 2 minutes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:37:37 -05:00
AJ Isaacs	622f0a325b	Add auto-polls to settle disagreements between users LLM analysis now detects when two users are in a genuine disagreement. When detected, the bot creates a native Discord poll with each user's position as an option. - Disagreement detection added to LLM analysis tool schema - Polls last 4 hours with 1 hour per-channel cooldown - LLM extracts topic, both positions, and usernames - Configurable via polls section in config.yaml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 09:22:32 -05:00
AJ Isaacs	b04d3da2bf	Add LLM request/response logging to database Log every LLM call (analysis, chat, image, raw_analyze) to a new LlmLog table with request type, model, token counts, duration, success/failure, and truncated request/response payloads. Enables debugging prompt issues and tracking usage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 22:55:19 -05:00
AJ Isaacs	e2404d052c	Improve LLM context with full timestamped channel history Send last ~8 messages from all users (not just others) as a multi-line chat log with relative timestamps so the LLM can better understand conversation flow and escalation patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 14:04:30 -05:00
AJ Isaacs	fee3e3e1bd	Add game channel redirect feature and sexual_vulgar detection Detect when users discuss a game in the wrong channel (e.g. GTA talk in #warzone) and send a friendly redirect to the correct channel. Also add sexual_vulgar category and scoring rules so crude sexual remarks directed at someone aren't softened by "lmao". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 17:02:59 -05:00
AJ Isaacs	e41845de02	Add scoreboard roast feature via image analysis When @mentioned with an image attachment, the bot now roasts players based on scoreboard screenshots using the vision model. Text-only mentions continue to work as before. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 16:30:26 -05:00
AJ Isaacs	b410200146	Add max_tokens=1024 to LLM analysis calls The analyze_message and raw_analyze methods had no max_tokens limit, causing thinking models (Qwen3-VL-32B-Thinking) to generate unlimited reasoning tokens before responding — taking 5+ minutes per message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 14:17:59 -05:00
AJ Isaacs	1151b705c0	Add LLM request queue, streaming chat, and rename ollama_client to llm_client - Serialize all LLM requests through an asyncio semaphore to prevent overloading athena with concurrent requests - Switch chat() to streaming so the typing indicator only appears once the model starts generating (not during thinking/loading) - Increase LLM timeout from 5 to 10 minutes for slow first loads - Rename ollama_client.py to llm_client.py and self.ollama to self.llm since the bot uses a generic OpenAI-compatible API - Update embed labels from "Ollama" to "LLM" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-21 13:45:12 -05:00

11 Commits