The benchmark could take 40–60s causing the Next.js proxy to drop the
connection (ECONNRESET) before the submission was saved. This left the
client with no submission ID (can't share) and nothing in the scoreboard.
Now:
- Submission is saved immediately after code verification; the client
receives { id } within seconds and can share right away
- Benchmark runs in a background async task and updates the DB record
when done (up to 6 min)
- Frontend polls GET /api/submissions/:id every 2s until avgRunTimeMs
is populated, showing a "Benchmarking..." spinner in the Avg Speed box
- Share button is enabled as soon as the submission ID is available
Also adds avgRunTimeMs to getSubmissionById response.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|---|---|---|
| backend | ||
| executor | ||
| frontend | ||
| .env.example | ||
| .gitignore | ||
| CLAUDE.md | ||
| docker-compose.yml | ||
| README.md | ||
🧩 Leetdle
A daily coding challenge game inspired by Wordle — but for code. Every day at midnight (Paris time), a new AI-generated coding challenge is published. Players solve it in Python, JavaScript, TypeScript, Rust, C++, or C, with their code executed in secure, sandboxed Docker containers.
🎯 Project Overview
Leetdle is a full-stack web application where users:
- Read a daily coding challenge with description, examples, and test cases
- Write a solution in their preferred language using an in-browser Monaco editor
- Submit their code, which runs against hidden test cases inside an isolated Docker container
- Compete — solve it in fewer tries and less time, then share results
The challenge is automatically generated by AI (deepseek/deepseek-r1 via OpenRouter), validated by running AI-generated solutions against the test cases, and saved to a SQLite database — all without human intervention.
🏗️ Architecture
graph TB
subgraph Frontend["Frontend (Next.js)"]
SSR["Server-Side Rendering<br/>page.tsx"]
GC["GameClient.tsx<br/>Monaco Editor + UI"]
end
subgraph Backend["Backend (Hono + Node.js)"]
API["REST API<br/>/api/challenge<br/>/api/execute<br/>/api/submissions"]
AI["AI Service<br/>deepseek/deepseek-r1 (via OpenRouter)"]
EXEC["Executor Service"]
FETCH["Fetcher Job<br/>Midnight Scheduler"]
DB["SQLite<br/>Drizzle ORM"]
end
subgraph Executors["Docker Executor Containers"]
PY["🐍 leetdle-python<br/>Alpine + Python3"]
JS["📜 leetdle-javascript<br/>Alpine + Node.js"]
TS["🔷 leetdle-typescript<br/>Alpine + Node.js + ts-node"]
RS["🦀 leetdle-rust<br/>rust:alpine + Cargo"]
CPP["⚙️ leetdle-cpp<br/>Alpine + g++ (C++17)"]
C["🔧 leetdle-c<br/>Alpine + gcc (C17)"]
end
SSR -->|"fetch /api/challenge/daily"| API
GC -->|"POST /api/execute"| API
GC -->|"POST /api/submissions"| API
API --> EXEC
API --> DB
FETCH -->|"Midnight Paris"| AI
AI -->|"Validate solutions"| EXEC
FETCH --> DB
EXEC -->|"docker run --rm"| PY
EXEC -->|"docker run --rm"| JS
EXEC -->|"docker run --rm"| TS
EXEC -->|"docker run --rm"| RS
EXEC -->|"docker run --rm"| CPP
EXEC -->|"docker run --rm"| C
style Frontend fill:#1a1a2e,stroke:#7c3aed,color:#fff
style Backend fill:#1a1a2e,stroke:#06b6d4,color:#fff
style Executors fill:#1a1a2e,stroke:#ef4444,color:#fff
⚙️ Challenge Generation & Validation Pipeline
Every day at midnight (Europe/Paris), the Fetcher Job triggers a fully automated pipeline. It uses multiple AI prompts — not one — across three distinct phases:
| Phase | AI calls | Notes |
|---|---|---|
| Challenge generation | 1 prompt | Full challenge JSON: title, description, testCases, solutions ×6, stubs ×6, benchmarkTestCase |
| Test runner generation | 6 prompts (parallel, one per language) | Per-language runner with {{USER_CODE}} placeholder |
| Solution fixing (if needed) | 0–3 prompts per language | 0 if the solution from step 1 passes first try; up to 1 initial + 2 AI-assisted fixes |
| Benchmark runner generation | 6 prompts (parallel, one per language) | Per-language runner for 1000-iteration timing loop |
| Runner regeneration (if corrupt) | +1 prompt per language | Triggered if non-ASCII chars detected in error output |
Best case: 13 AI calls (1 challenge + 6 test runners + 6 bench runners, all pass first try). Worst case per attempt: ~37 AI calls (+ runner regens + solution fixes across all 6 languages). The entire pipeline retries up to 3 times if any language fails validation.
flowchart TD
A["⏰ Midnight Paris<br/>Fetcher Job triggers"] --> B{"Today's challenge<br/>exists?"}
B -->|Yes| SKIP["Skip — already done"]
B -->|No| FETCH["Fetch last 180 challenge titles<br/>(deduplication)"]
FETCH --> C["🤖 PROMPT 1 — Challenge generation<br/>AI generates full JSON:<br/>title · description · difficulty<br/>testCases · solutions ×6 · stubs ×6<br/>benchmarkTestCase"]
C --> PARSE["Parse JSON response"]
PARSE --> PARALLEL
subgraph PARALLEL["🤖 PROMPTS 2–7 — Test Runner Generation + Validation (6 languages, in parallel)"]
direction TB
TR["🤖 AI generates test runner<br/>for this language"]
TR --> TRVAL{"Runner contains<br/>{{USER_CODE}}?"}
TRVAL -->|No — throw| TR
TRVAL -->|Yes| SOL["Use solution from challenge JSON<br/>(or ask AI if missing)"]
SOL --> RUN["▶ docker run<br/>code injected · test cases injected at runtime"]
RUN --> PASS{"All tests<br/>passed?"}
PASS -->|Yes ✅| DONE["Language validated"]
PASS -->|No| CORRUPT{"Non-ASCII chars<br/>in error output?"}
CORRUPT -->|Yes — corrupt runner| REGEN["🤖 Regenerate test runner<br/>(max 1 regen, doesn't count as fix)"]
REGEN --> RUN
CORRUPT -->|No| SAME{"Same error as<br/>previous attempt?"}
SAME -->|Yes — stuck| FAIL["❌ Language failed"]
SAME -->|No| FIX{"Fix attempts<br/>remaining? (max 2)"}
FIX -->|Yes| FIXSOL["🤖 AI fixes solution<br/>with error context"]
FIXSOL --> RUN
FIX -->|No| FAIL
end
PARALLEL --> ALL{"All 6 languages<br/>passed?"}
ALL -->|No| RETRY{"Challenge attempts<br/>remaining? (max 3)"}
RETRY -->|Yes| C
RETRY -->|No| ERR["🚨 Pipeline failed<br/>no challenge today"]
ALL -->|Yes| BENCH
subgraph BENCH["🤖 PROMPTS 8–13 — Benchmark Runner Generation + Timing (6 languages, in parallel)"]
direction TB
BR["🤖 AI generates benchmark runner<br/>for this language"]
BR --> BRUN["▶ docker run<br/>AI solution · benchmark case injected<br/>1000 iterations · 1.0 CPU · 256 MB (all languages equal)"]
BRUN --> BRES{"BENCHMARK_RESULT<br/>received?"}
BRES -->|No — retry once| BR
BRES -->|Yes| BAVG["Store avg ms<br/>for scoreboard"]
end
BENCH --> SAVE["💾 Save to SQLite<br/>challenge · test runners · benchmark runners<br/>validated solutions · AI benchmark times"]
style A fill:#7c3aed,stroke:#7c3aed,color:#fff
style SAVE fill:#22c55e,stroke:#22c55e,color:#fff
style ERR fill:#ef4444,stroke:#ef4444,color:#fff
style PARALLEL fill:#0f172a,stroke:#334155,color:#fff
style BENCH fill:#0f172a,stroke:#334155,color:#fff
style C fill:#1e3a5f,stroke:#3b82f6,color:#fff
Code Injection — How Test Cases Reach the Runner
The AI generates a test runner once with a {{USER_CODE}} placeholder. At execution time, the backend injects everything:
| Layer | Mechanism |
|---|---|
| User solution | {{USER_CODE}} text-replaced into the runner before execution |
| Test cases (Python / JS / TS) | {{TEST_CASES_B64}} placeholder replaced with base64-encoded JSON in the runner source |
| Test cases (Rust) | JSON piped to the binary via stdin after compilation |
| Test cases (C / C++) | Hardcoded as native typed arrays in the runner (AI writes them verbatim) |
| Benchmark case (Python / JS / TS) | {{BENCHMARK_CASE_B64}} placeholder, same mechanism |
| Benchmark case (Rust) | JSON piped via stdin |
| Benchmark case (C / C++) | Hardcoded in the benchmark runner |
This eliminates data transcription errors — the AI never writes test values into runner code for injected languages.
Validation Guards
| Guard | What triggers it | Action |
|---|---|---|
Missing {{USER_CODE}} |
Runner generated without the placeholder | Throw → trigger runner regeneration |
| Corrupt runner (non-ASCII chars in error) | AI leaked non-English text into compiled code | Regenerate runner (max 1 attempt) |
| Identical error after fix | Same failure output twice in a row | Give up early — likely a bad test case |
| Max fix attempts (2) | Solution still fails after 2 AI-assisted fixes | Language marked failed |
| Max challenge attempts (3) | Any language fails all fixes | Discard challenge, regenerate from scratch |
What Gets Stored (per challenge)
| Field | Description |
|---|---|
title |
Challenge name |
description |
Markdown-formatted problem statement |
difficulty |
Easy / Medium / Hard |
testCases |
JSON array of {input, expected} objects |
stubs |
Per-language starter code shown in the editor |
solutions |
AI-validated solutions (viewable after solving or exhausting tries) |
languageTests |
Per-language test runner scripts (with {{USER_CODE}} placeholder) |
benchmarkTestCase |
One large stress-test input (200–300 elements) for performance measurement |
benchmarkRunners |
Per-language benchmark runner scripts (1000-iteration timing loop, measures only solve()) |
aiBenchmarks |
Per-language AI solution avg execution time in ms (shown on scoreboard as 🤖 AI) |
🏃 User Submission Flow
No AI calls happen during user submissions. The test runners and benchmark runners are pre-generated and stored in SQLite — the backend just injects the user's code and runs them.
flowchart TD
U["User writes code in Monaco editor<br/>clicks Run"] --> EXEC["POST /api/execute<br/>code · language · challengeId"]
EXEC --> INJ["Backend injects user code + test cases<br/>into stored test runner<br/>{{USER_CODE}} → solution<br/>{{TEST_CASES_B64}} → base64 JSON (or stdin for Rust)"]
INJ --> D1["▶ docker run --rm<br/>--network none · --cap-drop ALL<br/>0.5–1.0 CPU · 60s timeout · --log-driver none"]
D1 --> OUT["Runner streams:<br/>CASE_PASSED · TESTS_PROGRESS X/Y<br/>ALL_TESTS_PASSED · or error"]
OUT --> P1{"ALL_TESTS_PASSED?"}
P1 -->|No| E1["Return error + progress<br/>User sees which test failed"]
P1 -->|Yes| NICK["Frontend prompts for nickname<br/>(stored in localStorage)"]
NICK --> SUB["POST /api/submissions<br/>nickname · language · code · tries · solveTimeMs"]
SUB --> VERIFY["Backend re-runs tests in Docker<br/>(security re-verification)"]
VERIFY --> P2{"Still passes?"}
P2 -->|No| E2["Reject submission"]
P2 -->|Yes| BENCHINJ["Inject user code + benchmark case<br/>into stored benchmark runner<br/>(same injection mechanism)"]
BENCHINJ --> D2["▶ docker run --rm<br/>--network none · --cap-drop ALL<br/>1.0 CPU · 256 MB · 5min timeout · --log-driver none<br/>(equal for ALL languages — fair comparison)"]
D2 --> BRES["Runner outputs:<br/>BENCHMARK_INPUT_SIZE n<br/>BENCHMARK_RESULT avg_ms<br/>(1000 iterations · only solve() measured)"]
BRES --> STORE["Store submission<br/>avgRunTimeMs · nickname · language · tries"]
STORE --> MODAL["Return { id, avgRunTimeMs }<br/>UI shows scoreboard:<br/>all submissions ranked by avgRunTimeMs ASC<br/>🤖 AI entries from aiBenchmarks"]
style U fill:#7c3aed,stroke:#7c3aed,color:#fff
style MODAL fill:#22c55e,stroke:#22c55e,color:#fff
style E1 fill:#7f1d1d,stroke:#ef4444,color:#fff
style E2 fill:#7f1d1d,stroke:#ef4444,color:#fff
🔒 Secure Docker Isolation
User-submitted code runs in ephemeral Docker containers with multiple layers of security:
Security Flags
| Flag | Purpose |
|---|---|
--rm |
Container is automatically deleted after execution |
--network none |
No internet access — code cannot make outbound requests |
--cap-drop ALL |
All Linux capabilities dropped (no ptrace, chown, kill, etc.) |
--memory 128m / 256m |
Hard memory limit (128 MB for Python/JS test runs, 256 MB for TS/Rust/C++/C and all benchmarks) |
--cpus 0.5 / 1.0 |
CPU limit (0.5 for Python/JS test runs, 1.0 for TS/Rust/C++/C; all languages get 1.0 for benchmarks for fair comparison) |
--log-driver none |
Docker logging disabled for ephemeral containers — stdout/stderr captured directly via Node.js pipes |
| Non-root user | All containers run as the leetdle user, never root |
| 60s / 5min timeout | Backend calls docker kill <name> then kills the child process if the container hasn't exited (60s for testing, 5min for benchmarks) |
| Unique container names | Each container gets a UUID-based name (leetdle-exec-* / leetdle-bench-*) so it can be force-killed on timeout |
| Base64 code transport | Code is base64-encoded before being passed to the container — prevents shell injection |
Executor Images
Each language has its own minimal Alpine-based Docker image:
| Image | Base | Pre-installed / Pre-warmed |
|---|---|---|
leetdle-python |
alpine:latest |
python3 |
leetdle-javascript |
alpine:latest |
nodejs |
leetdle-typescript |
alpine:latest |
nodejs, npm, typescript, ts-node |
leetdle-rust |
rust:alpine |
musl-dev, pre-compiled serde / serde_json / itertools deps |
leetdle-cpp |
alpine:latest |
g++, musl-dev, pre-warmed bits/stdc++.h headers (C++17) |
leetdle-c |
alpine:latest |
gcc, musl-dev, pre-warmed common C headers (C17) |
These images are pre-built by Docker Compose but never run as daemons — they only serve as base images for the ephemeral docker run containers spawned by the backend.
🌐 Frontend ↔ Backend Communication
Server-Side (SSR)
The Next.js page.tsx fetches today's challenge server-side at render time:
page.tsx → GET http://backend:5000/api/challenge/daily → renders GameClient
This uses Docker's internal DNS (backend hostname) since both containers share the leetdle_web bridge network.
Client-Side (Browser)
The GameClient.tsx component makes API calls from the browser via Next.js rewrites:
| Action | Method | Endpoint |
|---|---|---|
| Run code | POST |
/api/execute |
| Save submission | POST |
/api/submissions |
| Get challenge stats | GET |
/api/submissions/stats/:id |
| Get scoreboard | GET |
/api/submissions/scoreboard/:id |
| Get solution | GET |
/api/challenge/solution/:id |
| List all challenges | GET |
/api/challenge/all |
The next.config.js rewrites /api/* → http://backend:5000/api/*, proxying browser requests through the frontend container.
🚀 Getting Started
Prerequisites
- Docker & Docker Compose
- An OpenRouter API key — https://openrouter.ai
Setup
# 1. Clone the repository
git clone <repo-url> && cd leetdle
# 2. Create a .env file
cp .env.example .env
# Edit .env and set OPENROUTER_API_KEY
# 3. Build and start everything
docker compose up --build -d
# 4. Open in your browser
# http://localhost:3000
On first startup the backend automatically generates today's challenge using the AI pipeline. This takes ~3–5 minutes as it validates solutions and runs performance benchmarks across all 6 languages.
Useful Commands
docker compose logs -f backend # Watch backend logs (challenge generation progress)
docker compose build executor_rust # Rebuild a specific executor image
Project Structure
leetdle/
├── frontend/ # Next.js 14 app
│ ├── app/
│ │ ├── page.tsx # SSR entry — fetches daily challenge
│ │ ├── GameClient.tsx # Main game UI (Monaco editor, timer, modals)
│ │ ├── completionProviders.ts # Monaco autocomplete per language
│ │ ├── StatsButton.tsx # Landing page stats component
│ │ └── challenges/ # Previous challenges page
│ ├── public/ # Language SVG icons
│ └── Dockerfile
├── backend/ # Hono + Node.js API
│ └── src/
│ ├── index.ts # App entrypoint
│ ├── routes/ # API route handlers
│ ├── services/
│ │ ├── ai.ts # OpenRouter integration (challenge + runner generation)
│ │ ├── executor.ts # Docker container orchestration + code injection
│ │ ├── challenge.ts # Challenge CRUD + validation pipeline
│ │ └── submission.ts # Submission tracking & stats
│ ├── jobs/
│ │ └── fetcher.ts # Midnight scheduler (Europe/Paris)
│ └── db/
│ ├── schema.ts # Drizzle ORM schema
│ └── index.ts # Database connection + migrations
├── executor/ # Language-specific Docker images
│ ├── python/Dockerfile
│ ├── javascript/Dockerfile
│ ├── typescript/Dockerfile
│ ├── rust/Dockerfile
│ ├── cpp/Dockerfile
│ └── c/Dockerfile
├── data/ # Persisted SQLite database (created at runtime)
├── docker-compose.yml
└── .env.example