Initial commit
Add Discord Archive Manager project with: - Entity Framework Core data models for Discord exports - JSON import service for processing Discord chat exports - Archive service for managing imported data - Docker configuration for containerized deployment Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
159
README.md
Normal file
159
README.md
Normal file
@@ -0,0 +1,159 @@
|
||||
# Discord Archive Manager
|
||||
|
||||
A .NET 8 console application that parses DiscordChatExporter JSON exports and stores them in MSSQL with content-hashed image storage.
|
||||
|
||||
## Features
|
||||
|
||||
- Parses DiscordChatExporter JSON exports
|
||||
- Stores messages, users, channels, attachments, embeds, and reactions in MSSQL
|
||||
- Content-addressed image storage using SHA256 hashing (deduplicates identical files)
|
||||
- Tracks user profile changes over time via snapshots
|
||||
- Archives processed JSON files
|
||||
- Idempotent processing (skips already-processed files)
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
DiscordArchiveManager/
|
||||
├── src/DiscordArchiveManager/
|
||||
│ ├── Program.cs # Entry point
|
||||
│ ├── appsettings.json # Configuration
|
||||
│ ├── Models/
|
||||
│ │ ├── DiscordExport.cs # JSON deserialization models
|
||||
│ │ └── Entities/ # EF Core entities
|
||||
│ ├── Data/
|
||||
│ │ └── DiscordArchiveContext.cs
|
||||
│ └── Services/
|
||||
│ ├── JsonImportService.cs
|
||||
│ ├── ImageHashService.cs
|
||||
│ └── ArchiveService.cs
|
||||
├── Dockerfile
|
||||
├── docker-compose.yml
|
||||
└── README.md
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
- **Guilds**: Discord servers
|
||||
- **Channels**: Text channels within guilds
|
||||
- **Users**: Discord users (basic info)
|
||||
- **UserSnapshots**: Historical user profile data (nickname, color, avatar)
|
||||
- **Messages**: Chat messages
|
||||
- **Attachments**: Files attached to messages (stored with content hash)
|
||||
- **Embeds**: Rich embeds in messages
|
||||
- **Reactions**: Emoji reactions on messages
|
||||
- **Mentions**: User mentions in messages
|
||||
- **ProcessedFiles**: Tracking for imported files
|
||||
|
||||
## Image Storage
|
||||
|
||||
Images are stored using a content-addressed system:
|
||||
|
||||
1. Calculate SHA256 hash of the file
|
||||
2. Store at `/images/{hash[0:2]}/{hash[2:4]}/{hash}.{ext}`
|
||||
|
||||
Example: A file with hash `a1b2c3d4e5f6...` and extension `.png` is stored at:
|
||||
```
|
||||
/images/a1/b2/a1b2c3d4e5f6....png
|
||||
```
|
||||
|
||||
Benefits:
|
||||
- Automatic deduplication (identical files share storage)
|
||||
- Even distribution across directories
|
||||
- Fast lookup by hash
|
||||
|
||||
## Configuration
|
||||
|
||||
### appsettings.json
|
||||
|
||||
```json
|
||||
{
|
||||
"ConnectionStrings": {
|
||||
"Discord": "Server=192.168.10.99;Database=DiscordArchive;User Id=sa;Password=YourPassword;TrustServerCertificate=true"
|
||||
},
|
||||
"Paths": {
|
||||
"InputDirectory": "/app/input",
|
||||
"ArchiveDirectory": "/app/archive",
|
||||
"ImageDirectory": "/app/images"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Configuration can also be set via environment variables:
|
||||
- `ConnectionStrings__Discord`: Database connection string
|
||||
- `Paths__InputDirectory`: Directory to scan for JSON files
|
||||
- `Paths__ArchiveDirectory`: Directory to move processed files
|
||||
- `Paths__ImageDirectory`: Directory for content-hashed images
|
||||
|
||||
## Usage
|
||||
|
||||
### With Docker Compose
|
||||
|
||||
1. Create input/archive/images directories:
|
||||
```bash
|
||||
mkdir -p input archive images
|
||||
```
|
||||
|
||||
2. Place DiscordChatExporter JSON exports in the `input` directory
|
||||
|
||||
3. Update the connection string in `docker-compose.yml`
|
||||
|
||||
4. Build and run:
|
||||
```bash
|
||||
docker compose build
|
||||
docker compose up
|
||||
```
|
||||
|
||||
### Without Docker
|
||||
|
||||
1. Ensure .NET 8 SDK is installed
|
||||
|
||||
2. Update `appsettings.json` with your configuration
|
||||
|
||||
3. Build and run:
|
||||
```bash
|
||||
cd src/DiscordArchiveManager
|
||||
dotnet run
|
||||
```
|
||||
|
||||
## DiscordChatExporter Export Format
|
||||
|
||||
This tool expects JSON exports from [DiscordChatExporter](https://github.com/Tyrrrz/DiscordChatExporter).
|
||||
|
||||
When exporting, ensure:
|
||||
- Format: JSON
|
||||
- "Download assets" is enabled (for local attachment storage)
|
||||
|
||||
The tool expects the `_Files` directory to be alongside the JSON file:
|
||||
```
|
||||
exports/
|
||||
├── general-2024-01-15.json
|
||||
└── general-2024-01-15.json_Files/
|
||||
├── attachment1.png
|
||||
└── avatar123.webp
|
||||
```
|
||||
|
||||
## Processing Flow
|
||||
|
||||
1. Scan input directory for `*.json` files
|
||||
2. For each unprocessed file:
|
||||
- Parse JSON into model objects
|
||||
- Upsert Guild and Channel (idempotent)
|
||||
- Upsert Users and create snapshots for profile changes
|
||||
- Insert Messages (skip if ID exists)
|
||||
- Process attachments:
|
||||
- Calculate SHA256 hash
|
||||
- Copy to content-hashed location if new
|
||||
- Reference existing path if duplicate
|
||||
- Process embeds, reactions, and mentions
|
||||
3. Archive JSON file and `_Files` folder
|
||||
4. Record in ProcessedFiles table
|
||||
|
||||
## Re-running
|
||||
|
||||
The tool is safe to run multiple times:
|
||||
- Already-processed files are skipped (tracked in ProcessedFiles table)
|
||||
- Existing messages are not duplicated (checked by Discord message ID)
|
||||
- Duplicate images are not re-copied (checked by content hash)
|
||||
Reference in New Issue
Block a user