PLATFORM · AI VISION PIPELINE
The intelligence layer for the security operations center.
Every alert and search in Sentinel starts here: faces, plates, people, vehicles, and behavior recognized on your own hardware, indexed instantly, and searchable across millions of records. No cloud GPU. No frames leaving your network.
On-device inference, no cloud GPU · ONNX Runtime end-to-end · Any ONVIF / RTSP camera
ONE PIPELINE, EVERY SURFACE
Recognize once. Use everywhere.
Sentinel runs a single vision pipeline and feeds everything from it: the live overlay an operator watches, the alert that fires in the control room, the cross-camera path on the investigation map, and the patterns on a person profile. Each model below is one stage of that pipeline, and each runs in two places — on the agent at the edge for the real-time picture, and in a server-side AI worker for batch and re-processing — using the same models, so the live view and the forensic record always agree.
Find faces. Turn them into a number you can search.
SCRFD locates faces, a quality filter drops noise, and ArcFace converts each good face into a 512-D signature stored in pgvector for sub-second cosine kNN search — with identities auto-clustering as the network sees more of someone.
- SCRFD (default) and YuNet (lightweight) face detectors, both in the agent
- ArcFace 512-D embedding per face crop — the engine behind search, watchlist matching, and pattern-of-life
- pgvector IVFFlat index with cosine distance for sub-second kNN lookups
- Per-camera confidence threshold, tuned to individual camera quality
- Identity confidence surfaced honestly — Identifying / Few samples / Low confidence / Probable / Confirmed
Live face and person detection on real feeds — boxes drawn frame by frame as the network sees each subject.
Read every plate. Know who and what is in every frame.
PaddleOCR reads plate characters in two stages and indexes them for full-text cross-camera traces, while a COCO-class YOLO detector simultaneously classifies people, vehicles, and more to drive events, zone counting, and loitering.
- PaddleOCR two-stage detection + recognition — plate text full-text indexed
- Cross-camera vehicle trace by plate, the same way as a person
- 80-class COCO detector — person and vehicle classes drive events and analytics
- One shared YOLO model session across all cameras on an agent
- Region-correct plate rendering in the UI
Click to enlargeFrom plate read to cross-camera trace in the same query.
Scene context on every event. The pipeline, drawn on the live video.
EfficientNet tags scene context on every event, while per-frame detections stream over WebSocket as a color-coded live overlay — click any box to drill into the event, and a confirmed person keeps the same color across every camera.
- EfficientNet 1,000-class labeling with a low-confidence prefilter before persistence
- Per-frame detections streamed over WebSocket for a smooth, real-time overlay
- Color-coded by meaning — red for watchlist or weapon signal, amber for crowd or loitering, blue for vehicles
- Click a box to open its event directly from the live view
- Sticky per-subject color across all live views for cross-camera tracking at a glance
Click to enlargeScene tags enrich the timeline; the live overlay brings intelligence to the feed in real time.
ALL NINE MODELS
Every stage of the pipeline.
Face Detection — SCRFD / YuNet
Locates faces in each frame, returning a bounding box, five landmarks, and a confidence score. A quality filter drops blurred, extreme-angle, or tiny faces before anything reaches the next stage.
Face Embedding — ArcFace 512-D
Converts each detected face into a 512-dimension signature stored in pgvector and searched by cosine kNN. Auto-clusters embeddings into person profiles as the network sees more of someone.
Plate & Text OCR — PaddleOCR
Two-stage detection + recognition pipeline reads plate characters and indexes them for full-text search, so a vehicle of interest can be traced across cameras the same way a person can.
Object, Person & Vehicle — COCO YOLO
80-class detector classifies people, cars, trucks, buses, motorcycles, and more in real time. Drives pedestrian and vehicle events, zone counting, loitering, and the live overlay bounding boxes.
Weapon Alert (Beta)
A triage signal — not a verdict. Requires multi-frame confirmation (default 5 frames) at a higher confidence threshold before any alert fires. Human-in-the-loop only; the system takes no autonomous action, ever.
Emotion (Beta)
EfficientNet estimates a primary expression per face across seven classes. Surfaced as an optional chip on the events explorer — a where-to-look hint only, never a conclusion about a person.
Image Labels — EfficientNet
1,000-class ImageNet classifier tags scene context behind a detection — outdoor, street, vehicle — with a prefilter that drops low-confidence noise before anything is stored.
Crowd Density, Loitering & Zones
Zone counting reports a live people count per polygon. Loitering tracks dwell time per zone. Line crossing counts directional entries and exits. Motion gating skips empty scenes to save CPU.
Real-Time AI Overlay
Per-frame detection blobs stream to the dashboard over WebSocket. Color-coded by meaning — click any box to open its event. A confirmed person keeps the same color across every camera.
Weapon alert and emotion classification are Beta and are not presented as definitive. Performance figures describe design targets on representative hardware; real-world accuracy depends on camera quality, placement, scene, and lighting.
HOW IT WORKS
Frame to decision, in five stages.
Frame
The agent pulls frames from each ONVIF / RTSP camera at a configurable detector cadence (typically 1–5 FPS). Motion gating skips empty, stationary scenes.
Detect
YOLO finds people and vehicles; SCRFD/YuNet finds faces; PaddleOCR finds plate and text regions. Quality and human-presence filters drop the noise.
Embed
ArcFace converts each good face into a 512-D embedding; PaddleOCR reads plate characters; EfficientNet tags scene context and expression.
Index
Embeddings land in pgvector (cosine kNN), plates and text go into full-text indexes, and every detection becomes a structured event with class, confidence, box, camera, and timestamp.
Surface
The result appears everywhere at once — live overlay, alert banner, threat picture, events explorer, cross-camera map, and pattern-of-life profile. All reading the same indexed intelligence.
SPECIFICATIONS
Pipeline technical details
| Inference runtime | ONNX Runtime 1.22.0 |
| Face embedding dimensions | 512 (ArcFace) |
| Face search engine | PostgreSQL pgvector — IVFFlat index, cosine distance |
| Object classes | 80+ (COCO); face detection via SCRFD (default) / YuNet (lightweight) |
| Plate / text OCR | PaddleOCR two-stage — detection + recognition; full-text indexed |
| Image labels | EfficientNet — 1,000 ImageNet classes; low-confidence prefilter before persistence |
| Detector cadence | Configurable, typically 1–5 FPS per camera (well below playback rate) |
| Processing location | On-site edge agent (Windows) + server-side AI worker (Linux) — same models |
| Cloud GPU required | No — ONNX Runtime runs on standard CPU hardware on the edge |
| Internet required for inference | No — models ship with the agent, fully offline capable |
| Model bundle size | ~410 MB total (.onnx files) — ready to run on first install |
| Supported cameras | Any ONVIF / RTSP — auto-discovered by WS-Discovery or added by URL |
| Weapon alert | Beta — multi-frame confirmation (default 5 frames), higher confidence threshold, human-in-the-loop, no autonomous action |
| Emotion classification | Beta — 7-class expression estimate, per-class confidence, presented as a triage hint only |
| Identity confidence | Honest bands in the UI: Identifying / Few samples / Low confidence / Probable / Confirmed — never a false certainty |
Specifications describe shipped platform capabilities and design targets on representative hardware. We will confirm the configuration that fits your deployment during your demo.
EXPLORE THE PLATFORM
The vision pipeline powers the whole platform.
Face & Plate Search
The operator-facing capability this pipeline powers — find a person of interest or vehicle across every camera in seconds.
Learn moreScale & Performance
How the pipeline and its indexes hold up across thousands of cameras and millions of events — benchmarked, not claimed.
Learn moreDeployment & On-Prem
Run the full AI pipeline in your cloud, on-premise inside your firewall, or fully air-gapped — same models, same intelligence.
Learn moreSee the pipeline on live data.
Request demo access and watch the vision pipeline work end to end — faces and plates recognized on the live overlay, a person of interest found across every camera, and the same intelligence surfacing on the investigation map. Every model running on the edge, every action accounted for.