Initial commit

Add Discord Archive Manager project with:
- Entity Framework Core data models for Discord exports
- JSON import service for processing Discord chat exports
- Archive service for managing imported data
- Docker configuration for containerized deployment

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-20 12:26:38 -05:00
commit 2633bbf37a
24 changed files with 1635 additions and 0 deletions

159
README.md Normal file
View File

@@ -0,0 +1,159 @@
# Discord Archive Manager
A .NET 8 console application that parses DiscordChatExporter JSON exports and stores them in MSSQL with content-hashed image storage.
## Features
- Parses DiscordChatExporter JSON exports
- Stores messages, users, channels, attachments, embeds, and reactions in MSSQL
- Content-addressed image storage using SHA256 hashing (deduplicates identical files)
- Tracks user profile changes over time via snapshots
- Archives processed JSON files
- Idempotent processing (skips already-processed files)
## Project Structure
```
DiscordArchiveManager/
├── src/DiscordArchiveManager/
│ ├── Program.cs # Entry point
│ ├── appsettings.json # Configuration
│ ├── Models/
│ │ ├── DiscordExport.cs # JSON deserialization models
│ │ └── Entities/ # EF Core entities
│ ├── Data/
│ │ └── DiscordArchiveContext.cs
│ └── Services/
│ ├── JsonImportService.cs
│ ├── ImageHashService.cs
│ └── ArchiveService.cs
├── Dockerfile
├── docker-compose.yml
└── README.md
```
## Database Schema
- **Guilds**: Discord servers
- **Channels**: Text channels within guilds
- **Users**: Discord users (basic info)
- **UserSnapshots**: Historical user profile data (nickname, color, avatar)
- **Messages**: Chat messages
- **Attachments**: Files attached to messages (stored with content hash)
- **Embeds**: Rich embeds in messages
- **Reactions**: Emoji reactions on messages
- **Mentions**: User mentions in messages
- **ProcessedFiles**: Tracking for imported files
## Image Storage
Images are stored using a content-addressed system:
1. Calculate SHA256 hash of the file
2. Store at `/images/{hash[0:2]}/{hash[2:4]}/{hash}.{ext}`
Example: A file with hash `a1b2c3d4e5f6...` and extension `.png` is stored at:
```
/images/a1/b2/a1b2c3d4e5f6....png
```
Benefits:
- Automatic deduplication (identical files share storage)
- Even distribution across directories
- Fast lookup by hash
## Configuration
### appsettings.json
```json
{
"ConnectionStrings": {
"Discord": "Server=192.168.10.99;Database=DiscordArchive;User Id=sa;Password=YourPassword;TrustServerCertificate=true"
},
"Paths": {
"InputDirectory": "/app/input",
"ArchiveDirectory": "/app/archive",
"ImageDirectory": "/app/images"
}
}
```
### Environment Variables
Configuration can also be set via environment variables:
- `ConnectionStrings__Discord`: Database connection string
- `Paths__InputDirectory`: Directory to scan for JSON files
- `Paths__ArchiveDirectory`: Directory to move processed files
- `Paths__ImageDirectory`: Directory for content-hashed images
## Usage
### With Docker Compose
1. Create input/archive/images directories:
```bash
mkdir -p input archive images
```
2. Place DiscordChatExporter JSON exports in the `input` directory
3. Update the connection string in `docker-compose.yml`
4. Build and run:
```bash
docker compose build
docker compose up
```
### Without Docker
1. Ensure .NET 8 SDK is installed
2. Update `appsettings.json` with your configuration
3. Build and run:
```bash
cd src/DiscordArchiveManager
dotnet run
```
## DiscordChatExporter Export Format
This tool expects JSON exports from [DiscordChatExporter](https://github.com/Tyrrrz/DiscordChatExporter).
When exporting, ensure:
- Format: JSON
- "Download assets" is enabled (for local attachment storage)
The tool expects the `_Files` directory to be alongside the JSON file:
```
exports/
├── general-2024-01-15.json
└── general-2024-01-15.json_Files/
├── attachment1.png
└── avatar123.webp
```
## Processing Flow
1. Scan input directory for `*.json` files
2. For each unprocessed file:
- Parse JSON into model objects
- Upsert Guild and Channel (idempotent)
- Upsert Users and create snapshots for profile changes
- Insert Messages (skip if ID exists)
- Process attachments:
- Calculate SHA256 hash
- Copy to content-hashed location if new
- Reference existing path if duplicate
- Process embeds, reactions, and mentions
3. Archive JSON file and `_Files` folder
4. Record in ProcessedFiles table
## Re-running
The tool is safe to run multiple times:
- Already-processed files are skipped (tracked in ProcessedFiles table)
- Existing messages are not duplicated (checked by Discord message ID)
- Duplicate images are not re-copied (checked by content hash)