Add max_tokens=1024 to LLM analysis calls

The analyze_message and raw_analyze methods had no max_tokens limit,
causing thinking models (Qwen3-VL-32B-Thinking) to generate unlimited
reasoning tokens before responding — taking 5+ minutes per message.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-21 14:17:59 -05:00
parent 1151b705c0
commit b410200146

View File

@@ -123,6 +123,7 @@ class LLMClient:
tools=[ANALYSIS_TOOL],
tool_choice={"type": "function", "function": {"name": "report_analysis"}},
temperature=0.1,
max_tokens=1024,
)
choice = response.choices[0]
@@ -255,6 +256,7 @@ class LLMClient:
tools=[ANALYSIS_TOOL],
tool_choice={"type": "function", "function": {"name": "report_analysis"}},
temperature=0.1,
max_tokens=1024,
)
choice = response.choices[0]