PLATFORM · AI VISION PIPELINE

The intelligence layer for the security operations center.

Every alert and search in Sentinel starts here: faces, plates, people, vehicles, and behavior recognized on your own hardware, indexed instantly, and searchable across millions of records. No cloud GPU. No frames leaving your network.

Request Demo Access See the platform

LIVE · AI

24 FPS

PERSON · 98%

VEHICLE · 95%

FACE · 94%

WATCHLIST

On-device inference, no cloud GPU · ONNX Runtime end-to-end · Any ONVIF / RTSP camera

512-D

Face embedding per detection

searched by cosine similarity

Sub-second

Face and plate search

across millions of indexed records

Cloud GPUs required

inference runs on the edge agent

~410 MB

Bundled detection models

ship with the agent, ready offline

ONE PIPELINE, EVERY SURFACE

Recognize once. Use everywhere.

Sentinel runs a single vision pipeline and feeds everything from it: the live overlay an operator watches, the alert that fires in the control room, the cross-camera path on the investigation map, and the patterns on a person profile. Each model below is one stage of that pipeline, and each runs in two places — on the agent at the edge for the real-time picture, and in a server-side AI worker for batch and re-processing — using the same models, so the live view and the forensic record always agree.

DETECT · EMBED · MATCH

Find faces. Turn them into a number you can search.

SCRFD locates faces, a quality filter drops noise, and ArcFace converts each good face into a 512-D signature stored in pgvector for sub-second cosine kNN search — with identities auto-clustering as the network sees more of someone.

SCRFD (default) and YuNet (lightweight) face detectors, both in the agent
ArcFace 512-D embedding per face crop — the engine behind search, watchlist matching, and pattern-of-life
pgvector IVFFlat index with cosine distance for sub-second kNN lookups
Per-camera confidence threshold, tuned to individual camera quality
Identity confidence surfaced honestly — Identifying / Few samples / Low confidence / Probable / Confirmed

Live

Live face and person detection on real feeds — boxes drawn frame by frame as the network sees each subject.

READ · CLASSIFY

Read every plate. Know who and what is in every frame.

PaddleOCR reads plate characters in two stages and indexes them for full-text cross-camera traces, while a COCO-class YOLO detector simultaneously classifies people, vehicles, and more to drive events, zone counting, and loitering.

PaddleOCR two-stage detection + recognition — plate text full-text indexed
Cross-camera vehicle trace by plate, the same way as a person
80-class COCO detector — person and vehicle classes drive events and analytics
One shared YOLO model session across all cameras on an agent
Region-correct plate rendering in the UI

Plate search interface showing a vehicle frame with recognized plate characters and a cross-camera sightings panel

Click to enlarge

From plate read to cross-camera trace in the same query.

TAG · SURFACE

Scene context on every event. The pipeline, drawn on the live video.

EfficientNet tags scene context on every event, while per-frame detections stream over WebSocket as a color-coded live overlay — click any box to drill into the event, and a confirmed person keeps the same color across every camera.

EfficientNet 1,000-class labeling with a low-confidence prefilter before persistence
Per-frame detections streamed over WebSocket for a smooth, real-time overlay
Color-coded by meaning — red for watchlist or weapon signal, amber for crowd or loitering, blue for vehicles
Click a box to open its event directly from the live view
Sticky per-subject color across all live views for cross-camera tracking at a glance

Events explorer showing scene-tagged event thumbnails with label chips and confidence scores

Click to enlarge

Scene tags enrich the timeline; the live overlay brings intelligence to the feed in real time.

ALL NINE MODELS

Every stage of the pipeline.

DETECT

Face Detection — SCRFD / YuNet

Locates faces in each frame, returning a bounding box, five landmarks, and a confidence score. A quality filter drops blurred, extreme-angle, or tiny faces before anything reaches the next stage.

EMBED & MATCH

Face Embedding — ArcFace 512-D

Converts each detected face into a 512-dimension signature stored in pgvector and searched by cosine kNN. Auto-clusters embeddings into person profiles as the network sees more of someone.

READ

Plate & Text OCR — PaddleOCR

Two-stage detection + recognition pipeline reads plate characters and indexes them for full-text search, so a vehicle of interest can be traced across cameras the same way a person can.

CLASSIFY

Object, Person & Vehicle — COCO YOLO

80-class detector classifies people, cars, trucks, buses, motorcycles, and more in real time. Drives pedestrian and vehicle events, zone counting, loitering, and the live overlay bounding boxes.

WEAPON ALERT — BETA

Weapon Alert (Beta)

A triage signal — not a verdict. Requires multi-frame confirmation (default 5 frames) at a higher confidence threshold before any alert fires. Human-in-the-loop only; the system takes no autonomous action, ever.

EMOTION — BETA

Emotion (Beta)

EfficientNet estimates a primary expression per face across seven classes. Surfaced as an optional chip on the events explorer — a where-to-look hint only, never a conclusion about a person.

TAG

Image Labels — EfficientNet

1,000-class ImageNet classifier tags scene context behind a detection — outdoor, street, vehicle — with a prefilter that drops low-confidence noise before anything is stored.

BEHAVIOR

Crowd Density, Loitering & Zones

Zone counting reports a live people count per polygon. Loitering tracks dwell time per zone. Line crossing counts directional entries and exits. Motion gating skips empty scenes to save CPU.

SEE IT LIVE

Real-Time AI Overlay

Per-frame detection blobs stream to the dashboard over WebSocket. Color-coded by meaning — click any box to open its event. A confirmed person keeps the same color across every camera.

Weapon alert and emotion classification are Beta and are not presented as definitive. Performance figures describe design targets on representative hardware; real-world accuracy depends on camera quality, placement, scene, and lighting.

HOW IT WORKS

Frame to decision, in five stages.

Frame

The agent pulls frames from each ONVIF / RTSP camera at a configurable detector cadence (typically 1–5 FPS). Motion gating skips empty, stationary scenes.

Detect

YOLO finds people and vehicles; SCRFD/YuNet finds faces; PaddleOCR finds plate and text regions. Quality and human-presence filters drop the noise.

Embed

ArcFace converts each good face into a 512-D embedding; PaddleOCR reads plate characters; EfficientNet tags scene context and expression.

Index

Embeddings land in pgvector (cosine kNN), plates and text go into full-text indexes, and every detection becomes a structured event with class, confidence, box, camera, and timestamp.

Surface

The result appears everywhere at once — live overlay, alert banner, threat picture, events explorer, cross-camera map, and pattern-of-life profile. All reading the same indexed intelligence.

SPECIFICATIONS

Pipeline technical details

Inference runtime	ONNX Runtime 1.22.0
Face embedding dimensions	512 (ArcFace)
Face search engine	PostgreSQL pgvector — IVFFlat index, cosine distance
Object classes	80+ (COCO); face detection via SCRFD (default) / YuNet (lightweight)
Plate / text OCR	PaddleOCR two-stage — detection + recognition; full-text indexed
Image labels	EfficientNet — 1,000 ImageNet classes; low-confidence prefilter before persistence
Detector cadence	Configurable, typically 1–5 FPS per camera (well below playback rate)
Processing location	On-site edge agent (Windows) + server-side AI worker (Linux) — same models
Cloud GPU required	No — ONNX Runtime runs on standard CPU hardware on the edge
Internet required for inference	No — models ship with the agent, fully offline capable
Model bundle size	~410 MB total (.onnx files) — ready to run on first install
Supported cameras	Any ONVIF / RTSP — auto-discovered by WS-Discovery or added by URL
Weapon alert	Beta — multi-frame confirmation (default 5 frames), higher confidence threshold, human-in-the-loop, no autonomous action
Emotion classification	Beta — 7-class expression estimate, per-class confidence, presented as a triage hint only
Identity confidence	Honest bands in the UI: Identifying / Few samples / Low confidence / Probable / Confirmed — never a false certainty

Specifications describe shipped platform capabilities and design targets on representative hardware. We will confirm the configuration that fits your deployment during your demo.

EXPLORE THE PLATFORM

The vision pipeline powers the whole platform.

Face & Plate Search

The operator-facing capability this pipeline powers — find a person of interest or vehicle across every camera in seconds.

Learn more

Scale & Performance

How the pipeline and its indexes hold up across thousands of cameras and millions of events — benchmarked, not claimed.

Learn more

Deployment & On-Prem

Run the full AI pipeline in your cloud, on-premise inside your firewall, or fully air-gapped — same models, same intelligence.

Learn more

See the pipeline on live data.

Request demo access and watch the vision pipeline work end to end — faces and plates recognized on the live overlay, a person of interest found across every camera, and the same intelligence surfacing on the investigation map. Every model running on the edge, every action accounted for.

Request Demo Access Talk to Sales