YOLOv8 FP16 数据解析错误修复

问题描述

现象： 使用 YOLOv8 RKNN 模型（如 yolov8n-640.rknn、best-640.rknn）时，无法检测到目标，跟踪器显示 tracks=0，模型输出为垃圾值。

错误日志特征：

[ai_yolo] First box: x=125969024.000000, y=38730874880.000000, w=0.000000, h=0.000000
[ai_yolo] ProcessOutputV8 result: valid_count=8400 out of 8400 boxes

所有 8400 个 anchor 都通过了阈值检测
坐标值为超大异常数字
检测分数为 score=5561747627709562880.000000（溢出值）

根因分析

1. 数据类型不匹配

RKNN 模型的输出数据类型与代码解析方式不匹配：

模型	RKNN 输出类型	代码原处理方式	结果
YOLOv5	INT8 (量化)	`int8_t*` + 反量化	✅ 正常
YOLOv8	FP16 (半精度)	`reinterpret_cast<float*>`	❌ 错误

问题代码：

// ai_yolo_node.cpp 第 591-592 行（修复前）
if (outputs[0].type == RKNN_TENSOR_FLOAT32 || 
    outputs[0].type == RKNN_TENSOR_FLOAT16) {
    // 两者都按 float32 解析，导致 FP16 数据被错误解析
    valid_count = ProcessOutputV8(reinterpret_cast<float*>(...), ...);
}

2. 为什么 FP16 不能直接当 FP32 解析

FP16 (半精度浮点): 16位 = 1位符号 + 5位指数 + 10位尾数
FP32 (单精度浮点): 32位 = 1位符号 + 8位指数 + 23位尾数

直接内存解释为 FP32 时，两个 FP16 数值会被错误地合并成一个 FP32，导致数据完全错乱。

3. 为什么 YOLOv5 正常

YOLOv5 RKNN 模型默认使用 INT8 量化，代码本来就有反量化逻辑：

DequantizeAffineToF32(int8_t qnt, int32_t zp, float scale)

而 YOLOv8 RKNN 模型默认使用 FP16，代码缺乏 FP16→FP32 转换。

解决方法

1. 添加 FP16 到 FP32 转换函数

// ai_yolo_node.cpp
// FP16 (half) to FP32 conversion
// IEEE 754 half-precision: 1 sign bit, 5 exponent bits, 10 mantissa bits
inline float Fp16ToFp32(uint16_t h) {
    uint32_t sign = (h >> 15) & 0x1;
    uint32_t exp = (h >> 10) & 0x1F;
    uint32_t mant = h & 0x3FF;
    
    uint32_t f;
    if (exp == 0) {
        // Zero or subnormal
        if (mant == 0) {
            f = (sign << 31);  // Signed zero
        } else {
            // Subnormal: convert to normal
            exp = 1;
            while ((mant & 0x400) == 0) {
                mant <<= 1;
                exp--;
            }
            mant &= 0x3FF;
            f = (sign << 31) | ((exp + 112) << 23) | (mant << 13);
        }
    } else if (exp == 0x1F) {
        // Infinity or NaN
        f = (sign << 31) | (0xFF << 23) | (mant << 13);
    } else {
        // Normal number
        f = (sign << 31) | ((exp + 112) << 23) | (mant << 13);
    }
    
    float result;
    memcpy(&result, &f, sizeof(float));
    return result;
}

2. 单独处理 FP16 分支

// ai_yolo_node.cpp PostProcessBorrowed() 函数
if (outputs[0].type == RKNN_TENSOR_FLOAT32) {
    // FP32 直接解析
    valid_count = ProcessOutputV8(reinterpret_cast<float*>(...), ...);
} else if (outputs[0].type == RKNN_TENSOR_FLOAT16) {
    // FP16 先转换到 FP32 缓冲区
    size_t num_elements = outputs[0].size / sizeof(uint16_t);
    fp32_buffer_.resize(num_elements);
    const uint16_t* fp16_data = reinterpret_cast<const uint16_t*>(outputs[0].data);
    for (size_t i = 0; i < num_elements; ++i) {
        fp32_buffer_[i] = Fp16ToFp32(fp16_data[i]);
    }
    valid_count = ProcessOutputV8(fp32_buffer_.data(), ...);
} else {
    // INT8 反量化
    valid_count = ProcessOutputV8Int8(reinterpret_cast<int8_t*>(...), ...);
}

3. 添加 FP32 缓冲区成员变量

class AiYoloNode : public INode {
    // ...
#if defined(RK3588_ENABLE_RKNN)
    ModelHandle model_handle_ = kInvalidModelHandle;
    uint32_t n_output_ = 0;
    std::vector<uint8_t> rgb_tmp_;
    std::vector<float> fp32_buffer_;  // For FP16 to FP32 conversion
#endif
};

修复后的验证

正常日志：

[ai_yolo] First box: x=4.632812, y=6.921875, w=11.078125, h=14.296875
[ai_yolo] ProcessOutputV8 result: valid_count=23 out of 8400 boxes
[tracker] id=trk_cam1 tracks=1 created=1 removed=0 matched=153 unmatch_det=1

坐标值在正常范围 (0-640)
检测数量合理 (23/8400)
跟踪器正常工作 (tracks=1)
检测分数正常 (0.67)

影响范围

✅ YOLOv5 (INT8): 不受影响，继续正常工作
✅ YOLOv8 (FP16): 修复后正常工作
✅ YOLOv8 (FP32): 不受影响
✅ YOLOv8 (INT8): 不受影响

参考

RKNN API 数据类型定义: rknn_tensor_type in rknn_api.h
- RKNN_TENSOR_FLOAT32 = 0
- RKNN_TENSOR_FLOAT16 = 1
- RKNN_TENSOR_INT8 = 2
IEEE 754 半精度浮点标准

4.8 KiB Raw Blame History Unescape Escape