Add LLM request queue, streaming chat, and rename ollama_client to llm_client

- Serialize all LLM requests through an asyncio semaphore to prevent overloading athena with concurrent requests - Switch chat() to streaming so the typing indicator only appears once the model starts generating (not during thinking/loading) - Increase LLM timeout from 5 to 10 minutes for slow first loads - Rename ollama_client.py to llm_client.py and self.ollama to self.llm since the bot uses a generic OpenAI-compatible API - Update embed labels from "Ollama" to "LLM" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 13:45:12 -05:00
parent 645b924011
commit 1151b705c0
5 changed files with 120 additions and 87 deletions
@@ -82,7 +82,7 @@ class SentimentCog(commands.Cog):
        # Analyze the message
        context = self._get_context(message)
        user_notes = self.bot.drama_tracker.get_user_notes(message.author.id)
-        result = await self.bot.ollama.analyze_message(
+        result = await self.bot.llm.analyze_message(
            message.content, context, user_notes=user_notes
        )