fix: separate context from new messages so prior-cycle chat doesn't inflate scores

The conversation analysis was re-scoring old messages alongside new ones, causing users to get penalized repeatedly for already-scored messages. A "--- NEW MESSAGES ---" separator now marks which messages are new, and the prompt instructs the LLM to score only those. Also fixes bot-mention detection to require an explicit @mention in message text rather than treating reply-pings as scans (so toxic replies to bot warnings aren't silently skipped). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 15:48:02 -05:00
parent 8734f1883b
commit 7417908142
3 changed files with 31 additions and 17 deletions
@@ -40,8 +40,10 @@ Use the report_analysis tool to report your analysis of the TARGET MESSAGE only.

 CONVERSATION-LEVEL ANALYSIS (when given a CONVERSATION BLOCK instead of a single TARGET MESSAGE):
 When you receive a full conversation block with multiple users, use the report_conversation_scan tool instead:
- Provide ONE finding per user (not per message) — aggregate their behavior across the conversation.
- Weight their average tone and worst message equally when determining the toxicity_score.
+- The conversation block may contain a "--- NEW MESSAGES (score only these) ---" separator. Messages ABOVE the separator are CONTEXT ONLY (already scored in a prior cycle) — do NOT let them inflate scores. Messages BELOW the separator are the NEW messages to score.
+- Provide ONE finding per user who has NEW messages (not per message).
+- Score based ONLY on the user's NEW messages. Use context messages to understand tone and relationships, but do NOT penalize a user for something they said in the context section.
+- If a user's only new message is benign (e.g. "I got the 17.."), score it low regardless of what they said in context.
 - Use the same scoring bands (0.0-1.0) as for single messages.
 - Quote the worst/most problematic snippet in worst_message (max 100 chars, exact quote).
 - Flag off_topic if user's messages are primarily personal drama, not gaming.