AJ Isaacs 6f63f36df0 Improve import resilience with per-message saves and duplicate handling
Save after each message to isolate failures, catch and skip duplicate
key violations (SQL error 2601), and clear change tracker on rollback
to prevent cascading failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 23:20:08 -05:00
2026-01-20 12:26:38 -05:00
2026-01-20 12:26:38 -05:00
2026-01-20 12:26:38 -05:00
2026-01-20 12:26:38 -05:00

Discord Archive Manager

A .NET 8 console application that parses DiscordChatExporter JSON exports and stores them in MSSQL with content-hashed image storage.

Features

  • Parses DiscordChatExporter JSON exports
  • Stores messages, users, channels, attachments, embeds, and reactions in MSSQL
  • Content-addressed image storage using SHA256 hashing (deduplicates identical files)
  • Tracks user profile changes over time via snapshots
  • Archives processed JSON files
  • Idempotent processing (skips already-processed files)

Project Structure

DiscordArchiveManager/
├── src/DiscordArchiveManager/
│   ├── Program.cs              # Entry point
│   ├── appsettings.json        # Configuration
│   ├── Models/
│   │   ├── DiscordExport.cs    # JSON deserialization models
│   │   └── Entities/           # EF Core entities
│   ├── Data/
│   │   └── DiscordArchiveContext.cs
│   └── Services/
│       ├── JsonImportService.cs
│       ├── ImageHashService.cs
│       └── ArchiveService.cs
├── Dockerfile
├── docker-compose.yml
└── README.md

Database Schema

  • Guilds: Discord servers
  • Channels: Text channels within guilds
  • Users: Discord users (basic info)
  • UserSnapshots: Historical user profile data (nickname, color, avatar)
  • Messages: Chat messages
  • Attachments: Files attached to messages (stored with content hash)
  • Embeds: Rich embeds in messages
  • Reactions: Emoji reactions on messages
  • Mentions: User mentions in messages
  • ProcessedFiles: Tracking for imported files

Image Storage

Images are stored using a content-addressed system:

  1. Calculate SHA256 hash of the file
  2. Store at /images/{hash[0:2]}/{hash[2:4]}/{hash}.{ext}

Example: A file with hash a1b2c3d4e5f6... and extension .png is stored at:

/images/a1/b2/a1b2c3d4e5f6....png

Benefits:

  • Automatic deduplication (identical files share storage)
  • Even distribution across directories
  • Fast lookup by hash

Configuration

appsettings.json

{
  "ConnectionStrings": {
    "Discord": "Server=192.168.10.99;Database=DiscordArchive;User Id=sa;Password=YourPassword;TrustServerCertificate=true"
  },
  "Paths": {
    "InputDirectory": "/app/input",
    "ArchiveDirectory": "/app/archive",
    "ImageDirectory": "/app/images"
  }
}

Environment Variables

Configuration can also be set via environment variables:

  • ConnectionStrings__Discord: Database connection string
  • Paths__InputDirectory: Directory to scan for JSON files
  • Paths__ArchiveDirectory: Directory to move processed files
  • Paths__ImageDirectory: Directory for content-hashed images

Usage

With Docker Compose

  1. Create input/archive/images directories:

    mkdir -p input archive images
    
  2. Place DiscordChatExporter JSON exports in the input directory

  3. Update the connection string in docker-compose.yml

  4. Build and run:

    docker compose build
    docker compose up
    

Without Docker

  1. Ensure .NET 8 SDK is installed

  2. Update appsettings.json with your configuration

  3. Build and run:

    cd src/DiscordArchiveManager
    dotnet run
    

DiscordChatExporter Export Format

This tool expects JSON exports from DiscordChatExporter.

When exporting, ensure:

  • Format: JSON
  • "Download assets" is enabled (for local attachment storage)

The tool expects the _Files directory to be alongside the JSON file:

exports/
├── general-2024-01-15.json
└── general-2024-01-15.json_Files/
    ├── attachment1.png
    └── avatar123.webp

Processing Flow

  1. Scan input directory for *.json files
  2. For each unprocessed file:
    • Parse JSON into model objects
    • Upsert Guild and Channel (idempotent)
    • Upsert Users and create snapshots for profile changes
    • Insert Messages (skip if ID exists)
    • Process attachments:
      • Calculate SHA256 hash
      • Copy to content-hashed location if new
      • Reference existing path if duplicate
    • Process embeds, reactions, and mentions
  3. Archive JSON file and _Files folder
  4. Record in ProcessedFiles table

Re-running

The tool is safe to run multiple times:

  • Already-processed files are skipped (tracked in ProcessedFiles table)
  • Existing messages are not duplicated (checked by Discord message ID)
  • Duplicate images are not re-copied (checked by content hash)
Description
No description provided
Readme 68 KiB
Languages
C# 98.8%
Dockerfile 1%
Batchfile 0.2%