Skip to main content

Reliability

Backups that survive bad networks.

Captive portal detection. Bandwidth limiter. Retry classification. Exponential backoff. Stall watchdog. We don't fail a backup because the office Wi-Fi flapped or someone walked behind a microwave.

BigMind agent transferring a backup resiliently over an unreliable networkClick to enlarge

How it works

A transfer that adapts instead of failing

Every upload runs the same loop: sense whether the network is real, shape the transfer to the bandwidth you allow, send chunk by chunk, watch the heartbeat for stalls, and resume from the last completed chunk if anything drops.

Sense network
real internet?
Shape
bandwidth caps
Transfer
chunk by chunk
Watch
stall watchdog
Resume
from last chunk

Sense → Shape → Transfer → Watch → Resume — the loop that keeps a backup alive through a bad network.

Five layers of resilience

Every way a network can go wrong, handled.

Each mechanism targets a specific failure mode — so a flaky connection slows a backup down instead of silently breaking it.

Captive portal detection

Hotel Wi-Fi, airport Wi-Fi, coffee shop Wi-Fi — all use captive portals that intercept HTTPS. Our agent detects this and pauses the backup until real internet is reachable, instead of "completing" while pushing data into a captive-portal honeypot.

Retry classification

Not all errors are equal. Transient (timeout, 503) → exponential backoff. Auth (401, expired token) → refresh credentials. Hard (404 chunk gone, 403 forbidden) → fail fast, alert. Network gone (no DNS, no route) → pause, monitor for recovery.

Bandwidth limiter

Per-device bandwidth caps with daypart scheduling — "max 5 Mbps during business hours, full speed after 6pm." Token-bucket algorithm; smooth, not bursty. Doesn't tank the office video calls.

Stall watchdog

"In progress for 24 hours" is usually a dead backup that crashed silently. Our stall watchdog cron monitors backup heartbeats — if a job stops reporting progress for >X minutes, we mark it stalled, attempt recovery, and re-queue if needed. No more 24-hour "running" jobs that turn out to be ghosts.

Resume from interruption

If a backup is interrupted mid-run (network failure, agent crash, machine sleep) — it resumes from the last completed chunk. No re-uploading 50GB you already sent.

Resume, don't restart

Pick up exactly where the network dropped you.

A backup is never an all-or-nothing gamble against your connection. Interruptions — a dropped link, an agent crash, a machine going to sleep — are expected, and the next run continues from the last completed chunk instead of starting over.

  • Resumes from the last completed chunk, not byte zero
  • No re-uploading 50GB you already sent
  • Stall watchdog re-queues ghost jobs that stopped reporting progress
A resumed backup picking up from the last completed chunk in BigMindClick to enlarge

Reliability you only notice when something else fails.

Captive portals, throttled office Wi-Fi, dropped connections, ghost jobs — handled, so your backups keep finishing.