# Discord Archive Manager A .NET 8 console application that parses DiscordChatExporter JSON exports and stores them in MSSQL with content-hashed image storage. ## Features - Parses DiscordChatExporter JSON exports - Stores messages, users, channels, attachments, embeds, and reactions in MSSQL - Content-addressed image storage using SHA256 hashing (deduplicates identical files) - Tracks user profile changes over time via snapshots - Archives processed JSON files - Idempotent processing (skips already-processed files) ## Project Structure ``` DiscordArchiveManager/ ├── src/DiscordArchiveManager/ │ ├── Program.cs # Entry point │ ├── appsettings.json # Configuration │ ├── Models/ │ │ ├── DiscordExport.cs # JSON deserialization models │ │ └── Entities/ # EF Core entities │ ├── Data/ │ │ └── DiscordArchiveContext.cs │ └── Services/ │ ├── JsonImportService.cs │ ├── ImageHashService.cs │ └── ArchiveService.cs ├── Dockerfile ├── docker-compose.yml └── README.md ``` ## Database Schema - **Guilds**: Discord servers - **Channels**: Text channels within guilds - **Users**: Discord users (basic info) - **UserSnapshots**: Historical user profile data (nickname, color, avatar) - **Messages**: Chat messages - **Attachments**: Files attached to messages (stored with content hash) - **Embeds**: Rich embeds in messages - **Reactions**: Emoji reactions on messages - **Mentions**: User mentions in messages - **ProcessedFiles**: Tracking for imported files ## Image Storage Images are stored using a content-addressed system: 1. Calculate SHA256 hash of the file 2. Store at `/images/{hash[0:2]}/{hash[2:4]}/{hash}.{ext}` Example: A file with hash `a1b2c3d4e5f6...` and extension `.png` is stored at: ``` /images/a1/b2/a1b2c3d4e5f6....png ``` Benefits: - Automatic deduplication (identical files share storage) - Even distribution across directories - Fast lookup by hash ## Configuration ### appsettings.json ```json { "ConnectionStrings": { "Discord": "Server=192.168.10.99;Database=DiscordArchive;User Id=sa;Password=YourPassword;TrustServerCertificate=true" }, "Paths": { "InputDirectory": "/app/input", "ArchiveDirectory": "/app/archive", "ImageDirectory": "/app/images" } } ``` ### Environment Variables Configuration can also be set via environment variables: - `ConnectionStrings__Discord`: Database connection string - `Paths__InputDirectory`: Directory to scan for JSON files - `Paths__ArchiveDirectory`: Directory to move processed files - `Paths__ImageDirectory`: Directory for content-hashed images ## Usage ### With Docker Compose 1. Create input/archive/images directories: ```bash mkdir -p input archive images ``` 2. Place DiscordChatExporter JSON exports in the `input` directory 3. Update the connection string in `docker-compose.yml` 4. Build and run: ```bash docker compose build docker compose up ``` ### Without Docker 1. Ensure .NET 8 SDK is installed 2. Update `appsettings.json` with your configuration 3. Build and run: ```bash cd src/DiscordArchiveManager dotnet run ``` ## DiscordChatExporter Export Format This tool expects JSON exports from [DiscordChatExporter](https://github.com/Tyrrrz/DiscordChatExporter). When exporting, ensure: - Format: JSON - "Download assets" is enabled (for local attachment storage) The tool expects the `_Files` directory to be alongside the JSON file: ``` exports/ ├── general-2024-01-15.json └── general-2024-01-15.json_Files/ ├── attachment1.png └── avatar123.webp ``` ## Processing Flow 1. Scan input directory for `*.json` files 2. For each unprocessed file: - Parse JSON into model objects - Upsert Guild and Channel (idempotent) - Upsert Users and create snapshots for profile changes - Insert Messages (skip if ID exists) - Process attachments: - Calculate SHA256 hash - Copy to content-hashed location if new - Reference existing path if duplicate - Process embeds, reactions, and mentions 3. Archive JSON file and `_Files` folder 4. Record in ProcessedFiles table ## Re-running The tool is safe to run multiple times: - Already-processed files are skipped (tracked in ProcessedFiles table) - Existing messages are not duplicated (checked by Discord message ID) - Duplicate images are not re-copied (checked by content hash)