Skip to main content

PLATFORM · AI VISION PIPELINE

The intelligence layer for the security operations center.

Every alert and search in Sentinel starts here: faces, plates, people, vehicles, and behavior recognized on your own hardware, indexed instantly, and searchable across millions of records. No cloud GPU. No frames leaving your network.

LIVE · AI
24 FPS
PERSON · 98%
VEHICLE · 95%
FACE · 94%
WATCHLIST

On-device inference, no cloud GPU · ONNX Runtime end-to-end · Any ONVIF / RTSP camera

512-D
Face embedding per detection
searched by cosine similarity
Sub-second
Face and plate search
across millions of indexed records
0
Cloud GPUs required
inference runs on the edge agent
~410 MB
Bundled detection models
ship with the agent, ready offline

ONE PIPELINE, EVERY SURFACE

Recognize once. Use everywhere.

Sentinel runs a single vision pipeline and feeds everything from it: the live overlay an operator watches, the alert that fires in the control room, the cross-camera path on the investigation map, and the patterns on a person profile. Each model below is one stage of that pipeline, and each runs in two places — on the agent at the edge for the real-time picture, and in a server-side AI worker for batch and re-processing — using the same models, so the live view and the forensic record always agree.

DETECT · EMBED · MATCH

Find faces. Turn them into a number you can search.

SCRFD locates faces, a quality filter drops noise, and ArcFace converts each good face into a 512-D signature stored in pgvector for sub-second cosine kNN search — with identities auto-clustering as the network sees more of someone.

  • SCRFD (default) and YuNet (lightweight) face detectors, both in the agent
  • ArcFace 512-D embedding per face crop — the engine behind search, watchlist matching, and pattern-of-life
  • pgvector IVFFlat index with cosine distance for sub-second kNN lookups
  • Per-camera confidence threshold, tuned to individual camera quality
  • Identity confidence surfaced honestly — Identifying / Few samples / Low confidence / Probable / Confirmed
Live

Live face and person detection on real feeds — boxes drawn frame by frame as the network sees each subject.

READ · CLASSIFY

Read every plate. Know who and what is in every frame.

PaddleOCR reads plate characters in two stages and indexes them for full-text cross-camera traces, while a COCO-class YOLO detector simultaneously classifies people, vehicles, and more to drive events, zone counting, and loitering.

  • PaddleOCR two-stage detection + recognition — plate text full-text indexed
  • Cross-camera vehicle trace by plate, the same way as a person
  • 80-class COCO detector — person and vehicle classes drive events and analytics
  • One shared YOLO model session across all cameras on an agent
  • Region-correct plate rendering in the UI
Plate search interface showing a vehicle frame with recognized plate characters and a cross-camera sightings panelClick to enlarge

From plate read to cross-camera trace in the same query.

TAG · SURFACE

Scene context on every event. The pipeline, drawn on the live video.

EfficientNet tags scene context on every event, while per-frame detections stream over WebSocket as a color-coded live overlay — click any box to drill into the event, and a confirmed person keeps the same color across every camera.

  • EfficientNet 1,000-class labeling with a low-confidence prefilter before persistence
  • Per-frame detections streamed over WebSocket for a smooth, real-time overlay
  • Color-coded by meaning — red for watchlist or weapon signal, amber for crowd or loitering, blue for vehicles
  • Click a box to open its event directly from the live view
  • Sticky per-subject color across all live views for cross-camera tracking at a glance
Events explorer showing scene-tagged event thumbnails with label chips and confidence scoresClick to enlarge

Scene tags enrich the timeline; the live overlay brings intelligence to the feed in real time.

ALL NINE MODELS

Every stage of the pipeline.

DETECT

Face Detection — SCRFD / YuNet

Locates faces in each frame, returning a bounding box, five landmarks, and a confidence score. A quality filter drops blurred, extreme-angle, or tiny faces before anything reaches the next stage.

EMBED & MATCH

Face Embedding — ArcFace 512-D

Converts each detected face into a 512-dimension signature stored in pgvector and searched by cosine kNN. Auto-clusters embeddings into person profiles as the network sees more of someone.

READ

Plate & Text OCR — PaddleOCR

Two-stage detection + recognition pipeline reads plate characters and indexes them for full-text search, so a vehicle of interest can be traced across cameras the same way a person can.

CLASSIFY

Object, Person & Vehicle — COCO YOLO

80-class detector classifies people, cars, trucks, buses, motorcycles, and more in real time. Drives pedestrian and vehicle events, zone counting, loitering, and the live overlay bounding boxes.

WEAPON ALERT — BETA

Weapon Alert (Beta)

A triage signal — not a verdict. Requires multi-frame confirmation (default 5 frames) at a higher confidence threshold before any alert fires. Human-in-the-loop only; the system takes no autonomous action, ever.

EMOTION — BETA

Emotion (Beta)

EfficientNet estimates a primary expression per face across seven classes. Surfaced as an optional chip on the events explorer — a where-to-look hint only, never a conclusion about a person.

TAG

Image Labels — EfficientNet

1,000-class ImageNet classifier tags scene context behind a detection — outdoor, street, vehicle — with a prefilter that drops low-confidence noise before anything is stored.

BEHAVIOR

Crowd Density, Loitering & Zones

Zone counting reports a live people count per polygon. Loitering tracks dwell time per zone. Line crossing counts directional entries and exits. Motion gating skips empty scenes to save CPU.

SEE IT LIVE

Real-Time AI Overlay

Per-frame detection blobs stream to the dashboard over WebSocket. Color-coded by meaning — click any box to open its event. A confirmed person keeps the same color across every camera.

Weapon alert and emotion classification are Beta and are not presented as definitive. Performance figures describe design targets on representative hardware; real-world accuracy depends on camera quality, placement, scene, and lighting.

HOW IT WORKS

Frame to decision, in five stages.

1

Frame

The agent pulls frames from each ONVIF / RTSP camera at a configurable detector cadence (typically 1–5 FPS). Motion gating skips empty, stationary scenes.

2

Detect

YOLO finds people and vehicles; SCRFD/YuNet finds faces; PaddleOCR finds plate and text regions. Quality and human-presence filters drop the noise.

3

Embed

ArcFace converts each good face into a 512-D embedding; PaddleOCR reads plate characters; EfficientNet tags scene context and expression.

4

Index

Embeddings land in pgvector (cosine kNN), plates and text go into full-text indexes, and every detection becomes a structured event with class, confidence, box, camera, and timestamp.

5

Surface

The result appears everywhere at once — live overlay, alert banner, threat picture, events explorer, cross-camera map, and pattern-of-life profile. All reading the same indexed intelligence.

SPECIFICATIONS

Pipeline technical details

Inference runtimeONNX Runtime 1.22.0
Face embedding dimensions512 (ArcFace)
Face search enginePostgreSQL pgvector — IVFFlat index, cosine distance
Object classes80+ (COCO); face detection via SCRFD (default) / YuNet (lightweight)
Plate / text OCRPaddleOCR two-stage — detection + recognition; full-text indexed
Image labelsEfficientNet — 1,000 ImageNet classes; low-confidence prefilter before persistence
Detector cadenceConfigurable, typically 1–5 FPS per camera (well below playback rate)
Processing locationOn-site edge agent (Windows) + server-side AI worker (Linux) — same models
Cloud GPU requiredNo — ONNX Runtime runs on standard CPU hardware on the edge
Internet required for inferenceNo — models ship with the agent, fully offline capable
Model bundle size~410 MB total (.onnx files) — ready to run on first install
Supported camerasAny ONVIF / RTSP — auto-discovered by WS-Discovery or added by URL
Weapon alertBeta — multi-frame confirmation (default 5 frames), higher confidence threshold, human-in-the-loop, no autonomous action
Emotion classificationBeta — 7-class expression estimate, per-class confidence, presented as a triage hint only
Identity confidenceHonest bands in the UI: Identifying / Few samples / Low confidence / Probable / Confirmed — never a false certainty

Specifications describe shipped platform capabilities and design targets on representative hardware. We will confirm the configuration that fits your deployment during your demo.

See the pipeline on live data.

Request demo access and watch the vision pipeline work end to end — faces and plates recognized on the live overlay, a person of interest found across every camera, and the same intelligence surfacing on the investigation map. Every model running on the edge, every action accounted for.