4.3 KiB
Baseline-Aware Single-Iteration Orchestrator Usage
What It Does
The Artifact Loop Engine now runs one baseline-aware optimization iteration in a sandbox:
- Load a task spec.
- Snapshot the current accepted artifact baseline.
- Copy the repo into a temporary sandbox.
- Run the task mutator in the sandbox.
- Validate candidate changes against mutation limits.
- Run and score the candidate in the sandbox.
- Keep or discard the candidate.
- Sync back only allowed artifact files on
keep.
The main workspace stays unchanged on discard and crash.
Quick Start
Run the sample task from the repo root:
uv run python scripts/run_task.py --task tasks/skill-quality/task.yaml
Expected behavior:
- The command prints one JSON record to stdout.
- A matching JSON line is appended to
work/results.jsonl. tasks/skill-quality/fixtures/SKILL.mdis updated only if the candidate is kept.
Example result:
{"task_id":"skill-quality","status":"keep","reason":"no baseline available","candidate_score":4.0,"diff_summary":""}
Task Schema
A task file must include these sections:
iddescriptionartifactsmutationmutatorrunnerscorerobjectiveconstraintspolicybudgetlogging
Important runtime fields:
mutator:
type: command
command: "python ../../scripts/mutate_skill_task.py --task-dir . --artifact fixtures/SKILL.md"
cwd: "tasks/skill-quality"
timeout_seconds: 30
runner:
command: "python ../../scripts/evaluate_skill_task.py --task-dir . --artifact fixtures/SKILL.md --output ../../work/skill-run.json"
cwd: "tasks/skill-quality"
timeout_seconds: 30
scorer:
type: command
command: "python scripts/score_skill_task.py --input work/skill-run.json"
timeout_seconds: 30
parse:
format: json
score_field: score
metrics_field: metrics
Path Rules
task.root_diris the directory containingtask.yaml.artifacts.includepaths are resolved relative to the task directory.mutator.cwdandrunner.cwdare repo-relative paths.- Absolute
cwdvalues are rejected. ..segments incwdare rejected.
Keep, Discard, Crash
Keep
- Candidate is accepted.
- Only allowed artifact files are copied back into the main workspace.
Discard
- Candidate is rejected.
- Main workspace remains unchanged.
Crash
- Mutator, runner, or scorer execution failed.
- Main workspace remains unchanged.
- CLI exits non-zero.
Validation Rules
The orchestrator rejects a candidate before runner execution when:
- changed file count exceeds
artifacts.max_files_per_iteration - changed line count exceeds
mutation.max_changed_lines - changed file type is not allowed
- a non-artifact file was mutated
The orchestrator also revalidates artifact state before sync-back on keep, so later runner or scorer edits cannot bypass mutation limits.
Repo Directories Ignored By Sandbox State Checks
These repo-root directories are intentionally ignored during sandbox copy/hash validation:
work.venv.pytest_cache
Reason:
- They are runtime or cache state, not accepted source artifacts.
- Including them can distort keepability validation or make real runs unnecessarily slow.
Output Record
Each CLI run appends one JSON line to work/results.jsonl with:
task_idstatusreasoncandidate_scorediff_summary
Recommended Workflow For Adding A New Task
- Create a task directory under
tasks/. - Define the artifact set narrowly.
- Set conservative mutation limits first.
- Add a deterministic mutator command.
- Add a deterministic runner and scorer.
- Run
scripts/run_task.pydirectly. - Inspect the latest line in
work/results.jsonl.
Common Failure Cases
status = "discard"
Usually means:
- mutation budget exceeded
- disallowed file type
- non-artifact change detected
- candidate did not improve
status = "crash"
Usually means:
- mutator command failed
- runner command failed
- scorer command failed
- scorer output was not parseable
- configured
cwddoes not exist in the sandbox
Current Scope
This implementation supports exactly one isolated optimization iteration.
It does not yet implement:
- multi-iteration search
- parallel candidate execution
- git-backed sandboxing
- branch-per-candidate workflows