DetectionModelTraining/README.md

# 鞋子检测模型训练指南

## 当前主方案：YOLOv8s-640 + 脚部 ROI 训练

当前项目的主训练方向已经调整为：
- 只训练 `yolov8s`、输入尺寸固定 `640x640`
- 训练数据不再直接使用“整张场景图”或“鞋子纯特写图”
- 先根据鞋框裁出更接近线上输入分布的“脚部 ROI 图”，再训练鞋检测模型

这样做的原因是线上链路并不是直接在整张图上找鞋，而是：
1. 先从人体框生成脚部 ROI
2. 再在脚部 ROI 上做鞋检测

因此训练阶段也尽量模拟这个输入分布，保留一些裤脚、地面和周围背景，避免训练样本过于像商品特写。

### ROI 规则

单鞋 ROI：
- 已知鞋框 `(x, y, w, h)`
- `roi_x = x - 0.6w`
- `roi_y = y - 0.5h`
- `roi_w = 2.2w`
- `roi_h = 2.4h`

双鞋 ROI：
- 优先把两只鞋裁进同一张 ROI
- 先取两只鞋框并集，再扩框：
- `roi_x = union_x - 0.35 * union_w`
- `roi_y = union_y - 0.45 * union_h`
- `roi_w = 1.7 * union_w`
- `roi_h = 2.0 * union_h`

裁图会自动裁剪到图像边界内。

### 新主流程

1. 准备原始单类鞋数据集

```bash
python 01_download_dataset.py --source openimages --max-samples 5000
python 05_prepare_ppe_shoe_subset.py
```

2. 构建 ROI 化训练集

```bash
python 09_build_roi_shoe_dataset.py --clean
```

输出目录：
- `datasets/shoe-roi-mix`

3. 训练新的 ROI 模型

```bash
12_train_roi_yolov8s_640.bat
```

模型输出目录：
- `runs/roi_yolov8s_640`

说明：
- 新模型会写到新的项目目录，不覆盖之前已有模型
- 如果 `train_roi` 已存在，Ultralytics 会自动递增运行目录名

## 方案：640x640 单模型（部署时用2窗口）

**训练阶段**：
- 输入：640x640 完整图片
- 模型：YOLOv8s
- 输出：640x640 模型文件

**部署阶段**（pipeline配置）：
- 原图 1920x1080
- 分成 2 个 960x1080 窗口
- 每个窗口 resize 到 640x640 送入模型
- 合并检测结果

---

## 目录结构

```
train/
├── README.md                 # 本文件
├── 01_download_dataset.py    # 下载鞋子数据集（推荐 Open Images）
├── 02_train.bat              # Windows 一键训练脚本
├── 03_export_onnx.bat        # 导出 ONNX 脚本
├── 04_convert_rknn.py        # 转换为 RKNN 脚本
├── 05_prepare_ppe_shoe_subset.py # 提取 PPE 鞋子单类子集
├── 06_finetune_ppe.bat       # 用 PPE 鞋子子集做二阶段微调
├── data.yaml.template        # 数据集配置文件
└── samples/                  # 示例图片
    ├── calibration/
    ├── test_images/
    └── README.md
```

---

## 快速开始

### 1. 下载数据集

```bash
cd train
python 01_download_dataset.py --source openimages --max-samples 5000
```

### 2. 准备配置

```bash
脚本会自动生成 datasets/openimages-shoes-yolo/data.yaml
```

### 3. 训练（640x640）

```bash
02_train.bat
```

或手动：
```bash
yolo detect train \
    data=datasets/openimages-shoes-yolo/data.yaml \
    model=yolov8s.pt \
    epochs=150 \
    imgsz=640 \
    batch=16 \
    device=0
```

**训练参数**：
- 模型：YOLOv8s（速度和精度平衡）
- 输入：640x640
- 预计时间：30-60分钟

### 4. 导出 ONNX

```bash
03_export_onnx.bat
```

### 5. 转换为 RKNN

在 Ubuntu PC 上：
```bash
python 04_convert_rknn.py runs/detect/train/weights/best.onnx -o shoe_detector_640.rknn -t rk3588
```

### 6. 部署（2窗口配置）

复制到 RK3588：
```bash
scp shoe_detector_640.rknn orangepi@<rk3588_ip>:/home/orangepi/apps/OrangePi3588Media/models/
```

Pipeline 配置（部署阶段用2窗口）：
```json
{
  "id": "pre_shoe",
  "type": "preprocess",
  "windows": [
    {"x": 0, "y": 0, "w": 960, "h": 1080},
    {"x": 960, "y": 0, "w": 960, "h": 1080}
  ],
  "dst_w": 640,
  "dst_h": 640
}
```

### 7. 方案 A：PPE 二阶段微调

当 Open Images 基础模型训练完成后，可继续用 PPE 鞋子子集做场景微调：

```bash
python 05_prepare_ppe_shoe_subset.py
06_finetune_ppe.bat
```

PPE 鞋子子集来源：
- `boots`
- `no_boots`

这两个类会统一映射成单类：
- `shoe`

---

## 类别说明（Open Images）

Open Images 官方鞋类层级中，`Footwear` 的子类包括：
- `Boot`
- `Sandal`
- `High heels`
- `Roller skates`

本项目推荐下载：
- `Footwear`
- `Boot`

可选补充：
- `Sandal`

不建议默认加入：
- `High heels`
- `Roller skates`

训练时统一映射为单一类别：
- `0: shoe`

这样模型目标更聚焦，先尽量把鞋子稳定检出，再在后处理里判断是否为黑色鞋。

---

## 相关链接

- [Open Images 数据集](https://storage.googleapis.com/openimages/web/index.html)
- [Ultralytics YOLOv8](https://docs.ultralytics.com/)