fix: instruct LLM to never quote toxic content in note_updates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -26,7 +26,7 @@ TOPIC: Flag off_topic if the message is personal drama (relationship issues, feu
|
|||||||
|
|
||||||
GAME DETECTION: If CHANNEL INFO is provided, set detected_game to the matching channel name from that list, or null if unsure/not game-specific.
|
GAME DETECTION: If CHANNEL INFO is provided, set detected_game to the matching channel name from that list, or null if unsure/not game-specific.
|
||||||
|
|
||||||
USER NOTES: If provided, use to calibrate (e.g. if notes say "uses heavy profanity casually", profanity alone should score lower). Add a note_update only for genuinely new behavioral observations; null otherwise.
|
USER NOTES: If provided, use to calibrate (e.g. if notes say "uses heavy profanity casually", profanity alone should score lower). Add a note_update only for genuinely new behavioral observations; null otherwise. NEVER quote or repeat toxic/offensive language in note_update — describe patterns abstractly (e.g. "directed a personal insult at another user", NOT "called someone a [slur]").
|
||||||
|
|
||||||
RULE ENFORCEMENT: If SERVER RULES are provided, report clearly violated rule numbers in violated_rules. Only flag clear violations, not borderline.
|
RULE ENFORCEMENT: If SERVER RULES are provided, report clearly violated rule numbers in violated_rules. Only flag clear violations, not borderline.
|
||||||
|
|
||||||
|
|||||||
+2
-2
@@ -86,7 +86,7 @@ ANALYSIS_TOOL = {
|
|||||||
},
|
},
|
||||||
"note_update": {
|
"note_update": {
|
||||||
"type": ["string", "null"],
|
"type": ["string", "null"],
|
||||||
"description": "Brief new observation about this user's style/behavior for future reference, or null if nothing new.",
|
"description": "Brief new observation about this user's style/behavior for future reference, or null if nothing new. NEVER quote toxic language — describe patterns abstractly (e.g. 'uses personal insults when frustrated').",
|
||||||
},
|
},
|
||||||
"detected_game": {
|
"detected_game": {
|
||||||
"type": ["string", "null"],
|
"type": ["string", "null"],
|
||||||
@@ -189,7 +189,7 @@ CONVERSATION_TOOL = {
|
|||||||
},
|
},
|
||||||
"note_update": {
|
"note_update": {
|
||||||
"type": ["string", "null"],
|
"type": ["string", "null"],
|
||||||
"description": "New observation about this user's pattern, or null.",
|
"description": "New observation about this user's pattern, or null. NEVER quote toxic language — describe patterns abstractly.",
|
||||||
},
|
},
|
||||||
"detected_game": {
|
"detected_game": {
|
||||||
"type": ["string", "null"],
|
"type": ["string", "null"],
|
||||||
|
|||||||
Reference in New Issue
Block a user