Add max_tokens=1024 to LLM analysis calls

The analyze_message and raw_analyze methods had no max_tokens limit, causing thinking models (Qwen3-VL-32B-Thinking) to generate unlimited reasoning tokens before responding — taking 5+ minutes per message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 14:17:59 -05:00
parent 1151b705c0
commit b410200146
1 changed files with 2 additions and 0 deletions
@@ -123,6 +123,7 @@ class LLMClient:
                    tools=[ANALYSIS_TOOL],
                    tool_choice={"type": "function", "function": {"name": "report_analysis"}},
                    temperature=0.1,
+                    max_tokens=1024,
                )

                choice = response.choices[0]
@@ -255,6 +256,7 @@ class LLMClient:
                    tools=[ANALYSIS_TOOL],
                    tool_choice={"type": "function", "function": {"name": "report_analysis"}},
                    temperature=0.1,
+                    max_tokens=1024,
                )

                choice = response.choices[0]