feat: add jealousy/possessiveness detection as toxicity category

LLM can now flag possessive name-dropping, territorial behavior, and
jealousy signals when users mention others not in the conversation.
Scores feed into existing drama pipeline for warnings/mutes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-27 10:07:45 -05:00
parent 0449c8c30d
commit a73d2505d9
2 changed files with 3 additions and 0 deletions

View File

@@ -20,6 +20,7 @@ IMPORTANT RULES:
- If a user is QUOTING or REPORTING what someone else said (e.g. "you called them X", "he said Y to her"), score based on the user's own intent, NOT the quoted words. Tattling, reporting, or referencing someone else's language is not the same as using that language aggressively. These should score 0.0-0.2 unless the user is clearly weaponizing the quote to attack someone.
- Sexually crude or vulgar remarks DIRECTED AT someone (e.g. "you watch that to cum", "bet you get off to that") = 0.5-0.7 and category "sexual_vulgar". Adding "lol" or "lmao" does NOT soften sexual content aimed at a person — it's still degrading. General sexual jokes not targeting anyone specific can score lower (0.2-0.3).
- "lol"/"lmao" softening ONLY applies to mild trash-talk and frustration. It does NOT reduce the score for sexual content directed at someone, genuine hostility, or targeted personal attacks.
- JEALOUSY / POSSESSIVENESS: Watch for users who name-drop or bring up another server member (especially one not in the current conversation) in a possessive, territorial, or jealousy-driven way. Examples: unprompted mentions of someone's name to stake a claim ("well MY friend X said...", "X always comes to ME first"), passive-aggressive references to someone else's relationship with a third party, or inserting someone's name into a conversation they're not part of to establish dominance. Score 0.3-0.5 for mild jealousy signals (bringing someone up unprompted, subtle possessiveness). Score 0.5-0.7 for overt possessiveness or territorial behavior directed at another user. Use category "jealousy". NOTE: Simply mentioning someone's name in normal conversation is NOT jealousy — there must be possessive, territorial, or competitive intent behind it.
Also determine if the message is on-topic (gaming, games, matches, strategy, LFG, etc.) or off-topic personal drama (relationship issues, personal feuds, venting about real-life problems, gossip about people outside the server).

View File

@@ -37,6 +37,7 @@ ANALYSIS_TOOL = {
"hostile",
"manipulative",
"sexual_vulgar",
"jealousy",
"none",
],
},
@@ -130,6 +131,7 @@ CONVERSATION_TOOL = {
"hostile",
"manipulative",
"sexual_vulgar",
"jealousy",
"none",
],
},