fix: separate context from new messages so prior-cycle chat doesn't inflate scores
The conversation analysis was re-scoring old messages alongside new ones, causing users to get penalized repeatedly for already-scored messages. A "--- NEW MESSAGES ---" separator now marks which messages are new, and the prompt instructs the LLM to score only those. Also fixes bot-mention detection to require an explicit @mention in message text rather than treating reply-pings as scans (so toxic replies to bot warnings aren't silently skipped). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -40,8 +40,10 @@ Use the report_analysis tool to report your analysis of the TARGET MESSAGE only.
|
||||
|
||||
CONVERSATION-LEVEL ANALYSIS (when given a CONVERSATION BLOCK instead of a single TARGET MESSAGE):
|
||||
When you receive a full conversation block with multiple users, use the report_conversation_scan tool instead:
|
||||
- Provide ONE finding per user (not per message) — aggregate their behavior across the conversation.
|
||||
- Weight their average tone and worst message equally when determining the toxicity_score.
|
||||
- The conversation block may contain a "--- NEW MESSAGES (score only these) ---" separator. Messages ABOVE the separator are CONTEXT ONLY (already scored in a prior cycle) — do NOT let them inflate scores. Messages BELOW the separator are the NEW messages to score.
|
||||
- Provide ONE finding per user who has NEW messages (not per message).
|
||||
- Score based ONLY on the user's NEW messages. Use context messages to understand tone and relationships, but do NOT penalize a user for something they said in the context section.
|
||||
- If a user's only new message is benign (e.g. "I got the 17.."), score it low regardless of what they said in context.
|
||||
- Use the same scoring bands (0.0-1.0) as for single messages.
|
||||
- Quote the worst/most problematic snippet in worst_message (max 100 chars, exact quote).
|
||||
- Flag off_topic if user's messages are primarily personal drama, not gaming.
|
||||
|
||||
Reference in New Issue
Block a user