# Baseline-Aware Single-Iteration Orchestrator Usage

## What It Does

The Artifact Loop Engine now runs one baseline-aware optimization iteration in a sandbox:

1. Load a task spec.
2. Snapshot the current accepted artifact baseline.
3. Copy the repo into a temporary sandbox.
4. Run the task mutator in the sandbox.
5. Validate candidate changes against mutation limits.
6. Run and score the candidate in the sandbox.
7. Keep or discard the candidate.
8. Sync back only allowed artifact files on `keep`.

The main workspace stays unchanged on `discard` and `crash`.

## Quick Start

Run the sample task from the repo root:

```bash
uv run python scripts/run_task.py --task tasks/skill-quality/task.yaml
```

Expected behavior:

- The command prints one JSON record to stdout.
- A matching JSON line is appended to `work/results.jsonl`.
- `tasks/skill-quality/fixtures/SKILL.md` is updated only if the candidate is kept.

Example result:

```json
{"task_id":"skill-quality","status":"keep","reason":"no baseline available","candidate_score":4.0,"diff_summary":""}
```

## Task Schema

A task file must include these sections:

- `id`
- `description`
- `artifacts`
- `mutation`
- `mutator`
- `runner`
- `scorer`
- `objective`
- `constraints`
- `policy`
- `budget`
- `logging`

Important runtime fields:

```yaml
mutator:
  type: command
  command: "python ../../scripts/mutate_skill_task.py --task-dir . --artifact fixtures/SKILL.md"
  cwd: "tasks/skill-quality"
  timeout_seconds: 30

runner:
  command: "python ../../scripts/evaluate_skill_task.py --task-dir . --artifact fixtures/SKILL.md --output ../../work/skill-run.json"
  cwd: "tasks/skill-quality"
  timeout_seconds: 30

scorer:
  type: command
  command: "python scripts/score_skill_task.py --input work/skill-run.json"
  timeout_seconds: 30
  parse:
    format: json
    score_field: score
    metrics_field: metrics
```

## Path Rules

- `task.root_dir` is the directory containing `task.yaml`.
- `artifacts.include` paths are resolved relative to the task directory.
- `mutator.cwd` and `runner.cwd` are repo-relative paths.
- Absolute `cwd` values are rejected.
- `..` segments in `cwd` are rejected.

## Keep, Discard, Crash

### Keep

- Candidate is accepted.
- Only allowed artifact files are copied back into the main workspace.

### Discard

- Candidate is rejected.
- Main workspace remains unchanged.

### Crash

- Mutator, runner, or scorer execution failed.
- Main workspace remains unchanged.
- CLI exits non-zero.

## Validation Rules

The orchestrator rejects a candidate before runner execution when:

- changed file count exceeds `artifacts.max_files_per_iteration`
- changed line count exceeds `mutation.max_changed_lines`
- changed file type is not allowed
- a non-artifact file was mutated

The orchestrator also revalidates artifact state before sync-back on `keep`, so later runner or scorer edits cannot bypass mutation limits.

## Repo Directories Ignored By Sandbox State Checks

These repo-root directories are intentionally ignored during sandbox copy/hash validation:

- `work`
- `.venv`
- `.pytest_cache`

Reason:

- They are runtime or cache state, not accepted source artifacts.
- Including them can distort keepability validation or make real runs unnecessarily slow.

## Output Record

Each CLI run appends one JSON line to `work/results.jsonl` with:

- `task_id`
- `status`
- `reason`
- `candidate_score`
- `diff_summary`

## Recommended Workflow For Adding A New Task

1. Create a task directory under `tasks/`.
2. Define the artifact set narrowly.
3. Set conservative mutation limits first.
4. Add a deterministic mutator command.
5. Add a deterministic runner and scorer.
6. Run `scripts/run_task.py` directly.
7. Inspect the latest line in `work/results.jsonl`.

## Common Failure Cases

### `status = "discard"`

Usually means:

- mutation budget exceeded
- disallowed file type
- non-artifact change detected
- candidate did not improve

### `status = "crash"`

Usually means:

- mutator command failed
- runner command failed
- scorer command failed
- scorer output was not parseable
- configured `cwd` does not exist in the sandbox

## Current Scope

This implementation supports exactly one isolated optimization iteration.

It does not yet implement:

- multi-iteration search
- parallel candidate execution
- git-backed sandboxing
- branch-per-candidate workflows