Switch LLM backend from llama.cpp/Qwen to OpenAI

- Default models: gpt-4o-mini (triage), gpt-4o (escalation)
- Remove Qwen-specific /no_think hacks
- Reduce timeout from 600s to 120s, increase concurrency semaphore to 4
- Support empty LLM_BASE_URL to use OpenAI directly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-23 12:07:53 -05:00
parent a9bc24e48e
commit 28fb66d5f9
3 changed files with 24 additions and 31 deletions

View File

@@ -1,6 +1,7 @@
DISCORD_BOT_TOKEN=your_token_here
LLM_BASE_URL=http://athena.lan:11434
LLM_MODEL=Qwen3-VL-32B-Thinking-Q8_0
LLM_API_KEY=not-needed
LLM_BASE_URL=
LLM_MODEL=gpt-4o-mini
LLM_ESCALATION_MODEL=gpt-4o
LLM_API_KEY=your_openai_api_key_here
MSSQL_SA_PASSWORD=YourStrong!Passw0rd
DB_CONNECTION_STRING=DRIVER={ODBC Driver 18 for SQL Server};SERVER=localhost,1433;DATABASE=BreehaviorMonitor;UID=sa;PWD=YourStrong!Passw0rd;TrustServerCertificate=yes