docs: add artifact loop engine usage

2026-04-02 13:48:56 +08:00 · 2026-04-02 13:48:56 +08:00 · f9ccb42d6b
commit f9ccb42d6b
parent b19c07e0dd
1 changed files with 16 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -6,6 +6,17 @@

 The idea: give an AI agent a small but real LLM training setup and let it experiment autonomously overnight. It modifies the code, trains for 5 minutes, checks if the result improved, keeps or discards, and repeats. You wake up in the morning to a log of experiments and (hopefully) a better model. The training code here is a simplified single-GPU implementation of [nanochat](https://github.com/karpathy/nanochat). The core idea is that you're not touching any of the Python files like you normally would as a researcher. Instead, you are programming the `program.md` Markdown files that provide context to the AI agents and set up your autonomous research org. The default `program.md` in this repo is intentionally kept as a bare bones baseline, though it's obvious how one would iterate on it over time to find the "research org code" that achieves the fastest research progress, how you'd add more agents to the mix, etc. A bit more context on this project is here in this [tweet](https://x.com/karpathy/status/2029701092347630069) and [this tweet](https://x.com/karpathy/status/2031135152349524125).

+The repo also includes a generic Artifact Loop Engine for editable text artifacts such as prompts, skills, config files, and small code paths. It applies the same iterate-evaluate-repeat pattern to these artifacts and writes structured iteration results to `work/results.jsonl`.
+
+Engine concepts:
+
+- **`artifacts`** — the editable inputs the task is allowed to change.
+- **`runner`** — executes an iteration over the selected artifact set.
+- **`scorer`** — evaluates each iteration and records the outcome.
+- **`policy`** — decides what to keep, discard, or try next.
+
+The task spec schema also includes a `mutation` section, but mutation-budget enforcement is reserved for a future baseline-aware orchestration layer and is not yet applied by the current CLI loop.
+
 ## How it works

 The repo is deliberately kept small and only really has three files that matter:
@ -35,10 +46,15 @@ uv run prepare.py

 # 4. Manually run a single training experiment (~5 min)
 uv run train.py
+
+# 5. Run the Artifact Loop Engine task runner
+uv run python scripts/run_task.py --task tasks/skill-quality/task.yaml
 ```

 If the above commands all work ok, your setup is working and you can go into autonomous research mode.

+The task runner writes structured iteration results to `work/results.jsonl`.
+
 ## Running the agent

 Simply spin up your Claude/Codex or whatever you want in this repo (and disable all permissions), then you can prompt something like: