Add Discord Archive Manager project with: - Entity Framework Core data models for Discord exports - JSON import service for processing Discord chat exports - Archive service for managing imported data - Docker configuration for containerized deployment Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
160 lines
4.4 KiB
Markdown
160 lines
4.4 KiB
Markdown
# Discord Archive Manager
|
|
|
|
A .NET 8 console application that parses DiscordChatExporter JSON exports and stores them in MSSQL with content-hashed image storage.
|
|
|
|
## Features
|
|
|
|
- Parses DiscordChatExporter JSON exports
|
|
- Stores messages, users, channels, attachments, embeds, and reactions in MSSQL
|
|
- Content-addressed image storage using SHA256 hashing (deduplicates identical files)
|
|
- Tracks user profile changes over time via snapshots
|
|
- Archives processed JSON files
|
|
- Idempotent processing (skips already-processed files)
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
DiscordArchiveManager/
|
|
├── src/DiscordArchiveManager/
|
|
│ ├── Program.cs # Entry point
|
|
│ ├── appsettings.json # Configuration
|
|
│ ├── Models/
|
|
│ │ ├── DiscordExport.cs # JSON deserialization models
|
|
│ │ └── Entities/ # EF Core entities
|
|
│ ├── Data/
|
|
│ │ └── DiscordArchiveContext.cs
|
|
│ └── Services/
|
|
│ ├── JsonImportService.cs
|
|
│ ├── ImageHashService.cs
|
|
│ └── ArchiveService.cs
|
|
├── Dockerfile
|
|
├── docker-compose.yml
|
|
└── README.md
|
|
```
|
|
|
|
## Database Schema
|
|
|
|
- **Guilds**: Discord servers
|
|
- **Channels**: Text channels within guilds
|
|
- **Users**: Discord users (basic info)
|
|
- **UserSnapshots**: Historical user profile data (nickname, color, avatar)
|
|
- **Messages**: Chat messages
|
|
- **Attachments**: Files attached to messages (stored with content hash)
|
|
- **Embeds**: Rich embeds in messages
|
|
- **Reactions**: Emoji reactions on messages
|
|
- **Mentions**: User mentions in messages
|
|
- **ProcessedFiles**: Tracking for imported files
|
|
|
|
## Image Storage
|
|
|
|
Images are stored using a content-addressed system:
|
|
|
|
1. Calculate SHA256 hash of the file
|
|
2. Store at `/images/{hash[0:2]}/{hash[2:4]}/{hash}.{ext}`
|
|
|
|
Example: A file with hash `a1b2c3d4e5f6...` and extension `.png` is stored at:
|
|
```
|
|
/images/a1/b2/a1b2c3d4e5f6....png
|
|
```
|
|
|
|
Benefits:
|
|
- Automatic deduplication (identical files share storage)
|
|
- Even distribution across directories
|
|
- Fast lookup by hash
|
|
|
|
## Configuration
|
|
|
|
### appsettings.json
|
|
|
|
```json
|
|
{
|
|
"ConnectionStrings": {
|
|
"Discord": "Server=192.168.10.99;Database=DiscordArchive;User Id=sa;Password=YourPassword;TrustServerCertificate=true"
|
|
},
|
|
"Paths": {
|
|
"InputDirectory": "/app/input",
|
|
"ArchiveDirectory": "/app/archive",
|
|
"ImageDirectory": "/app/images"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
Configuration can also be set via environment variables:
|
|
- `ConnectionStrings__Discord`: Database connection string
|
|
- `Paths__InputDirectory`: Directory to scan for JSON files
|
|
- `Paths__ArchiveDirectory`: Directory to move processed files
|
|
- `Paths__ImageDirectory`: Directory for content-hashed images
|
|
|
|
## Usage
|
|
|
|
### With Docker Compose
|
|
|
|
1. Create input/archive/images directories:
|
|
```bash
|
|
mkdir -p input archive images
|
|
```
|
|
|
|
2. Place DiscordChatExporter JSON exports in the `input` directory
|
|
|
|
3. Update the connection string in `docker-compose.yml`
|
|
|
|
4. Build and run:
|
|
```bash
|
|
docker compose build
|
|
docker compose up
|
|
```
|
|
|
|
### Without Docker
|
|
|
|
1. Ensure .NET 8 SDK is installed
|
|
|
|
2. Update `appsettings.json` with your configuration
|
|
|
|
3. Build and run:
|
|
```bash
|
|
cd src/DiscordArchiveManager
|
|
dotnet run
|
|
```
|
|
|
|
## DiscordChatExporter Export Format
|
|
|
|
This tool expects JSON exports from [DiscordChatExporter](https://github.com/Tyrrrz/DiscordChatExporter).
|
|
|
|
When exporting, ensure:
|
|
- Format: JSON
|
|
- "Download assets" is enabled (for local attachment storage)
|
|
|
|
The tool expects the `_Files` directory to be alongside the JSON file:
|
|
```
|
|
exports/
|
|
├── general-2024-01-15.json
|
|
└── general-2024-01-15.json_Files/
|
|
├── attachment1.png
|
|
└── avatar123.webp
|
|
```
|
|
|
|
## Processing Flow
|
|
|
|
1. Scan input directory for `*.json` files
|
|
2. For each unprocessed file:
|
|
- Parse JSON into model objects
|
|
- Upsert Guild and Channel (idempotent)
|
|
- Upsert Users and create snapshots for profile changes
|
|
- Insert Messages (skip if ID exists)
|
|
- Process attachments:
|
|
- Calculate SHA256 hash
|
|
- Copy to content-hashed location if new
|
|
- Reference existing path if duplicate
|
|
- Process embeds, reactions, and mentions
|
|
3. Archive JSON file and `_Files` folder
|
|
4. Record in ProcessedFiles table
|
|
|
|
## Re-running
|
|
|
|
The tool is safe to run multiple times:
|
|
- Already-processed files are skipped (tracked in ProcessedFiles table)
|
|
- Existing messages are not duplicated (checked by Discord message ID)
|
|
- Duplicate images are not re-copied (checked by content hash)
|