feat: require warning before mute + sustained toxicity escalation

Gate mutes behind a prior warning — first offense always gets a warning,
mute only fires if warned_since_reset is True. Warned flag is persisted
to DB (new Warned column on UserState) and survives restarts.

Add post-warning escalation boost to drama_score: each high-scoring
message after a warning adds +0.04 (configurable) so sustained bad
behavior ramps toward the mute threshold instead of plateauing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-25 11:07:57 -05:00
parent f02a4ab49d
commit 71c7b45e9a
4 changed files with 56 additions and 16 deletions
+2 -1
View File
@@ -17,8 +17,9 @@ sentiment:
context_messages: 8 # Number of previous messages to include as context
rolling_window_size: 10 # Number of messages to track per user
rolling_window_minutes: 15 # Time window for tracking
batch_window_seconds: 10 # Wait this long for more messages before analyzing (debounce)
batch_window_seconds: 4 # Wait this long for more messages before analyzing (debounce)
escalation_threshold: 0.25 # Triage toxicity score that triggers re-analysis with heavy model
escalation_boost: 0.04 # Per-message drama boost after warning (sustained toxicity ramps toward mute)
game_channels:
gta-online: "GTA Online"