Add track-aware face alarm design doc

This commit is contained in:
tian 2026-04-15 12:30:11 +08:00
parent 9700bb85bb
commit ec6818b206

View File

@ -0,0 +1,480 @@
# Face Recognition Track-Aware Alarm Design
## 1. Background
The current face recognition alarm path is:
`face_det -> face_recog -> alarm.face_rules -> actions`
This path is already able to:
- recognize known persons from the gallery
- classify single-frame results as `known` or `unknown`
- generate `known_person` and `unknown_face` alarms
- upload snapshots and clips through the existing alarm action chain
However, current alarm behavior is still dominated by single-frame face recognition results. In workshop testing, the same known person can produce:
- `known_person` alarms on close, high-quality frames
- `unknown_face` alarms on far, small, or low-quality frames
This is not acceptable for the target workshop scenario. In this scenario:
- low-quality face observations can be ignored
- alarm accuracy is more important than alarm recall
- alarm frequency must stay low
- known-person alarms are used as attendance punch events
- short leave-and-return behavior should not generate repeated punch alarms
The repository already has a person tracker and shoe-related logic that rely on `track_id`. This design reuses that capability instead of introducing a second face-specific tracker.
## 2. Goals
### 2.1 Functional Goals
- Reuse existing person `track_id` for face identity aggregation.
- Stop treating `unknown` as the direct opposite of `known`.
- Ignore low-quality face observations instead of forcing them into `unknown_face`.
- Generate face alarms per tracked person instead of per single frame.
- Prevent duplicate alarms while the same person remains in the scene.
- Prevent immediate repeated alarms after a short leave-and-return event.
### 2.2 Non-Goals
- Do not replace the existing person tracker implementation.
- Do not introduce a new standalone face tracker.
- Do not change face embedding extraction or gallery search behavior.
- Do not merge detection, recognition, tracking, and alarming into one plugin.
## 3. Current State Review
### 3.1 What Already Exists
- `plugins/tracker/tracker_node.cpp`
- assigns stable `track_id` values to person detections in `frame->det`
- `plugins/logic_gate/logic_gate_node.cpp`
- already consumes person `track_id` for shoe-related reasoning
- `plugins/ai_face_det/*` and `plugins/ai_face_recog/ai_face_recog_node.cpp`
- already produce face detection and recognition results
- `plugins/alarm/alarm_node.cpp`
- already supports face-specific rules and the existing alarm action chain
### 3.2 Current Gap
The current face path does not actually reuse person tracking:
- `FaceDetItem` has a `track_id` field, but face detectors currently fill it with `-1`
- `FaceRecogItem` only carries gallery identity fields such as `best_person_id`
- face alarm vote keys currently use:
- `best_person_id` for known-person rules
- constant `unknown` for unknown-person rules
This means current face alarm behavior cannot answer:
- whether two frames belong to the same physical person
- whether a temporary low-score frame belongs to a person already recognized moments earlier
- whether an alarm should be suppressed because the person is still on screen
## 4. Design Summary
The design adds a track-aware identity aggregation layer on top of existing person tracking.
The revised behavior is:
1. Person detections continue to receive `track_id` from the existing tracker.
2. Each recognized face is associated with one tracked person in the same frame.
3. The associated `person_track_id` is stored on face recognition results.
4. Alarm logic aggregates recognition evidence per `person_track_id`.
5. The alarm decision becomes three-state:
- `known`
- `unknown`
- `uncertain`
6. Only stable `known` or stable `unknown` states may trigger alarms.
7. `uncertain` observations are ignored.
This preserves the existing DAG architecture and keeps responsibilities separated:
- tracker tracks persons
- face plugins produce recognition evidence
- alarm node owns business-facing identity confirmation and deduplication
## 5. Data Flow
The intended runtime path becomes:
`person_det -> tracker -> face_det -> face_recog(face-person association) -> alarm(track-aware face rules) -> actions`
### 5.1 Face-to-Person Association
For each recognized face in a frame, associate it to one person detection from `frame->det` that already has a valid `track_id`.
Recommended matching order:
1. Prefer person boxes that contain the face center point.
2. If multiple person boxes qualify, choose the one with the highest overlap quality.
3. If no containing person box exists, optionally fall back to IoU / overlap ratio matching.
4. If no reliable match exists, leave the face unassociated.
This keeps the matching logic simple and aligned with the workshop camera scenario, where one face should usually lie inside one person box.
### 5.2 Result Enrichment
Extend face recognition results with the associated person track metadata.
Recommended additions to `FaceRecogItem`:
- `int person_track_id = -1`
- optional future field: `float person_match_score`
This lets downstream plugins consume face identity evidence with person continuity information, without coupling them to raw person detections.
## 6. Identity State Model
Per `person_track_id`, maintain a short-lived identity aggregation state in the alarm node.
Recommended tracked fields:
- `track_id`
- `first_seen_ms`
- `last_seen_ms`
- `last_quality_pass_ms`
- `best_known_person_id`
- `best_known_name`
- `best_sim_peak`
- `best_known_hit_count`
- `quality_pass_count`
- `unknown_candidate_count`
- `reported_known`
- `reported_unknown`
- `last_report_ms`
The state exists only while the track is active, plus a short retention window needed for re-entry suppression.
## 7. Three-State Decision Model
### 7.1 States
Each person track may be in one of three states:
- `uncertain`
- insufficient quality or insufficient evidence
- `known`
- stable evidence for one known gallery identity
- `unknown`
- stable evidence that the tracked person is not matching known identities
### 7.2 Why `uncertain` Is Required
`uncertain` is the key change for this scenario.
Examples that should remain `uncertain`:
- face too small
- poor alignment
- temporary blur
- far-distance observations
- unstable similarity fluctuations
- too few valid observations for the current track
These observations should not generate any identity alarm.
## 8. Quality Gating
Only quality-qualified face observations should participate in identity aggregation.
Recommended quality checks:
- associated `person_track_id >= 0`
- face area ratio above configured minimum
- face aspect ratio within configured bounds
- landmarks available when alignment is required
- optional minimum bbox size in pixels
- optional minimum confidence from face detection
If a frame fails quality gating:
- do not count it toward `unknown`
- do not count it toward `known`
- keep the track state as `uncertain`
This directly matches the workshop requirement: low-quality data can be ignored.
## 9. Known-Person Confirmation
Known-person confirmation should require repeated evidence for the same gallery identity on the same tracked person.
Recommended conditions:
- face passed quality gating
- `best_person_id >= 0`
- `best_sim >= known_accept`
- `(best_sim - second_sim) >= known_margin`
- same `best_person_id` observed at least `known_min_hits` times inside `known_hit_window_ms`
Optional improvement:
- allow a peak-sim shortcut when `best_sim` is very high and consistent
Once the track reaches stable `known`:
- trigger `known_person`
- mark the track as `reported_known`
- suppress all later known alarms for the same active track
## 10. Unknown-Person Confirmation
Unknown-person confirmation must be stricter than known-person confirmation.
Unknown should not mean:
- "this frame is not known"
Unknown should mean:
- "this tracked person has been observed long enough, at sufficient quality, and still cannot be confirmed as any known person"
Recommended conditions:
- face passed quality gating
- valid `person_track_id`
- track age exceeds `unknown_min_track_age_ms`
- quality-qualified observations reach `unknown_min_quality_hits`
- no stable known identity has been confirmed for this track
- recognition remains below known confirmation thresholds during the window
Optional additional conditions:
- require the top candidate identity to remain inconsistent
- require multiple low-confidence or ambiguous frames before final unknown confirmation
Once the track reaches stable `unknown`:
- trigger `unknown_face`
- mark the track as `reported_unknown`
- suppress all later unknown alarms for the same active track
## 11. Alarm Deduplication and Re-Entry Control
### 11.1 Active-Track Deduplication
Within one active `person_track_id`:
- `known_person` may trigger at most once
- `unknown_face` may trigger at most once
### 11.2 Re-Entry Suppression
The workshop scenario treats known-person alarms as punch events. Therefore:
- if the same known employee remains on screen, do not re-alarm
- if the same known employee briefly leaves and re-enters, do not re-alarm immediately
Recommended suppression keys:
- known person: keyed by `gallery person_id`
- unknown person: keyed by recent track history or a future stronger fingerprint
Recommended timers:
- `known_reentry_cooldown_ms`
- `unknown_reentry_cooldown_ms`
Known-person suppression should be relatively long, because attendance punching should be sparse.
## 12. Configuration Design
Introduce a dedicated face track aggregation config section under the alarm face-rule path or a sibling alarm section.
Recommended fields:
```json
{
"face_track_aggregation": {
"enable": true,
"associate_with_person_track": true,
"require_person_track": true,
"person_match_mode": "face_center_in_person",
"person_match_min_iou": 0.05,
"quality": {
"min_face_area_ratio": 0.001,
"min_face_width": 32,
"min_face_height": 32,
"require_landmarks": true
},
"known": {
"accept": 0.45,
"margin": 0.05,
"min_hits": 3,
"hit_window_ms": 3000,
"reentry_cooldown_ms": 300000
},
"unknown": {
"min_track_age_ms": 2000,
"min_quality_hits": 4,
"reentry_cooldown_ms": 300000
}
}
}
```
Notes:
- exact placement may be adjusted to match current config conventions
- existing face rule fields should remain supported where practical
- migration should minimize breaking existing configs
## 13. File-Level Changes
### 13.1 Data Model
Modify:
- `include/face/face_result.h`
Changes:
- add `person_track_id` to `FaceRecogItem`
- optionally add future-friendly metadata for association confidence
### 13.2 Face Recognition Node
Modify:
- `plugins/ai_face_recog/ai_face_recog_node.cpp`
Changes:
- associate each recognized face to a tracked person from `frame->det`
- write `person_track_id` into `FaceRecogItem`
- extend debug log output to include `person_track_id`
### 13.3 Alarm Node
Modify:
- `plugins/alarm/alarm_node.cpp`
Changes:
- add track-aware face identity aggregation state
- replace per-frame unknown alarm behavior with track-based unknown confirmation
- change face vote key logic to prefer `person_track_id`
- add deduplication and re-entry suppression based on track-aware identity state
### 13.4 Tests
Modify or add:
- platform-independent unit tests for association logic
- platform-independent unit tests for track-aware known confirmation
- platform-independent unit tests for track-aware unknown suppression
- platform-independent unit tests for re-entry cooldown behavior
## 14. Compatibility and Migration
The design should be introduced in a backward-aware way.
Recommended compatibility strategy:
1. keep existing face recognition output fields unchanged
2. add new fields instead of renaming old ones
3. keep current config behavior available when track-aware aggregation is disabled
4. allow current face rules to coexist with the new aggregation mode during rollout
This enables:
- safer staged rollout
- easier comparison between old and new behavior
- simpler troubleshooting on RK3588
## 15. Validation Strategy
### 15.1 Local Code-Level Validation
Local validation should focus on platform-independent logic only:
- face-to-person association behavior
- aggregation state transitions
- known confirmation window logic
- unknown suppression logic
- re-entry cooldown behavior
- config parsing and backward compatibility
### 15.2 RK3588 Device-Side Validation
Final validation must be completed on RK3588:
- known person from far to near
- expect delayed but stable known-person alarm
- no unknown false alarm
- known person brief leave and quick re-entry
- expect no repeated punch alarm
- known person long leave and re-entry after cooldown
- expect one new punch alarm
- truly unknown person with adequate face quality
- expect one unknown alarm after evidence accumulation
- low-quality unknown face
- expect no alarm
- multiple persons in frame
- verify face-person association uses the correct person track
## 16. Risks and Mitigations
### Risk 1: Incorrect face-person association
Impact:
- identity evidence may be attached to the wrong tracked person
Mitigation:
- start with simple center-in-box matching
- log `person_track_id` and association decisions in debug mode
- validate on multi-person RK3588 scenes
### Risk 2: Unknown confirmation becomes too conservative
Impact:
- unknown alarms may be delayed or reduced
Mitigation:
- make unknown thresholds configurable
- prefer under-reporting over false workshop alerts in early rollout
### Risk 3: Re-entry suppression too aggressive
Impact:
- valid repeated attendance events may be skipped
Mitigation:
- make re-entry cooldown configurable
- document business interpretation clearly as punch-style attendance
## 17. Rollout Recommendation
Recommended rollout order:
1. add `person_track_id` propagation and debug logs
2. add track-aware known confirmation
3. add conservative track-aware unknown confirmation
4. add re-entry suppression tuning
5. validate behavior on RK3588 with known and unknown workshop videos
This staged rollout reduces risk and allows behavior comparison at each step.
## 18. Expected Outcome
After this design is implemented, the system should behave as follows in the workshop face-recognition scenario:
- poor-quality face observations are ignored
- the same known employee is confirmed from accumulated evidence, not single-frame luck
- transient low-score frames do not become stranger alarms
- a person who stays in scene triggers at most one identity alarm
- short leave-and-return behavior does not trigger repeated punch alarms
- stranger alarms become rarer but more trustworthy
This is the intended trade-off for the workshop deployment: lower alarm frequency and higher alarm precision.