Extract LLM prompts to separate text files and fix quoting penalty

Move the analysis and chat personality system prompts from inline Python strings to prompts/analysis.txt and prompts/chat_personality.txt for easier editing. Also add a rule so users quoting/reporting what someone else said are not penalized for the quoted words. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 12:19:28 -05:00
parent 63b4b3adb8
commit 645b924011
4 changed files with 66 additions and 58 deletions
@@ -1,44 +1,14 @@
 import json
 import logging
+from pathlib import Path

 from openai import AsyncOpenAI

 logger = logging.getLogger("bcs.llm")

-SYSTEM_PROMPT = """You are a Discord chat moderator AI for a gaming server. You will be given a TARGET MESSAGE to analyze, plus recent channel context for background.
+_PROMPTS_DIR = Path(__file__).resolve().parent.parent / "prompts"

-CRITICAL: Only score the TARGET MESSAGE. The context is ONLY for understanding tone and conversation flow. Do NOT score the context messages — they belong to other users and are already being analyzed separately.
-
-CONTEXT — This is a friend group who use crude nicknames (e.g. "tits" is someone's nickname). A nickname alone is NOT toxic. However, you must still flag genuinely aggressive language.
-
-SCORING GUIDE — Be precise with scores:
- 0.0-0.1: Completely harmless. Casual chat, jokes, "lmao", greetings, game talk, nicknames.
- 0.2-0.3: Mildly edgy. Light trash-talk that's clearly playful ("you suck at this game lol").
- 0.4-0.5: Moderate. Frustrated or heated language that could go either way. Insults without "lol/lmao" softening them.
- 0.6-0.7: Aggressive. Direct insults, hostility, telling someone off. Even among friends, "fuck you, you piece of shit" is aggressive language that should score here regardless of context.
- 0.8-1.0: Severely toxic. Threats, targeted harassment, telling someone to leave, attacking insecurities, sustained personal attacks.
-
-IMPORTANT RULES:
- "Tits" as a nickname = 0.0, not toxic.
- Profanity ALONE (just "fuck" or "shit" with no target) = low score (0.0-0.1).
- Profanity DIRECTED AT someone ("fuck you", "you piece of shit") = moderate-to-high score (0.5-0.7) even among friends.
- Do NOT let friendly context excuse clearly aggressive language. Friends can still cross lines.
- If a message contains BOTH a nickname AND an insult ("fuck you tits you piece of shit"), score the insult, not the nickname.
- If the target message is just "lmao", "lol", an emoji, or a short neutral reaction, it is ALWAYS 0.0 regardless of what other people said before it.
-
-Also determine if the message is on-topic (gaming, games, matches, strategy, LFG, etc.) or off-topic personal drama (relationship issues, personal feuds, venting about real-life problems, gossip about people outside the server).
-
-Also assess the message's coherence — how well-formed, readable, and grammatically correct it is.
- 0.9-1.0: Clear, well-written, normal for this user
- 0.6-0.8: Some errors but still understandable (normal texting shortcuts like "u" and "ur" are fine — don't penalize those)
- 0.3-0.5: Noticeably degraded — garbled words, missing letters, broken sentences beyond normal shorthand
- 0.0-0.2: Nearly incoherent — can barely understand what they're trying to say
-
-You may also be given NOTES about this user from prior interactions. Use these to calibrate your scoring — for example, if notes say "uses heavy profanity casually" then profanity alone should score lower for this user.
-
-If you notice something noteworthy about this user's communication style, behavior, or patterns that would help future analysis, include it as a note_update. Only add genuinely useful observations — don't repeat what's already in the notes. If nothing new, leave note_update as null.
-
-Use the report_analysis tool to report your analysis of the TARGET MESSAGE only."""
+SYSTEM_PROMPT = (_PROMPTS_DIR / "analysis.txt").read_text(encoding="utf-8")

 ANALYSIS_TOOL = {
    "type": "function",