Add LLM request queue, streaming chat, and rename ollama_client to llm_client
- Serialize all LLM requests through an asyncio semaphore to prevent overloading athena with concurrent requests - Switch chat() to streaming so the typing indicator only appears once the model starts generating (not during thinking/loading) - Increase LLM timeout from 5 to 10 minutes for slow first loads - Rename ollama_client.py to llm_client.py and self.ollama to self.llm since the bot uses a generic OpenAI-compatible API - Update embed labels from "Ollama" to "LLM" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
+1
-1
@@ -82,7 +82,7 @@ class SentimentCog(commands.Cog):
|
||||
# Analyze the message
|
||||
context = self._get_context(message)
|
||||
user_notes = self.bot.drama_tracker.get_user_notes(message.author.id)
|
||||
result = await self.bot.ollama.analyze_message(
|
||||
result = await self.bot.llm.analyze_message(
|
||||
message.content, context, user_notes=user_notes
|
||||
)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user