Mutual surveillance (peers)
A monitor is only as good as the machine it runs on. Two Hora nodes (two Raspberry Pi at two friends’ places, a VPS and a homelab…) can watch each other with dead-man heartbeats: each node pings the other while it is healthy; when the pings stop, the survivor alerts.
Nothing here is a bespoke protocol - every exchange is plain HTTP, so a peer can be another Hora, a healthchecks.io / UptimeRobot endpoint, or a cron job.
The outbound heartbeat: [health]
Section titled “The outbound heartbeat: [health]”[health]id = "hora-a" # this node's identity (how peers refer to it)interval_secs = 60 # heartbeat cadence while healthygrace_secs = 180 # startup grace before a never-seen peer is alerted# quorum = true # see below# heartbeat_url = "${HC_PING_URL}" # optional extra dead-man targetThe node POSTs to each peer’s ping_url every interval_secs - but only
while it is locally healthy (its scheduler is ticking and its database is
writable). A hung process stops pinging, and the receiver notices.
Peers: two independent halves
Section titled “Peers: two independent halves”Each [[peers]] entry can declare either or both directions:
[[peers]]id = "hora-b" # the peer's [health].idname = "Hora B (Paris)"# OUT - I heartbeat the peer while I'm healthy:ping_url = "https://b.example/api/push/hora-a"ping_token = "${PEER_B_TOKEN}" # sent as X-Push-Token# IN - I watch the peer and alert if it goes silent:expect_every_secs = 90listen_token = "${PEER_B_IN}" # required from the peer's pings# witness_url = "https://b.example/healthz" # default: origin(ping_url)/healthz# notify = ["ops-telegram"] # route this peer's alertsWatched peers appear in their own section on the status page (their state
does not roll into the overall badge - it tracks your services, not the
surveillance mesh). [health] and [[peers]] reload live like everything
else.
Quorum: outage or partition?
Section titled “Quorum: outage or partition?”With three or more nodes, set quorum = true: before alerting a peer down,
the node asks the other peers’ /healthz whether they still see it. If any
does, it is a network partition between you two - reported as a degraded peer
link, not a false “node down”. A no-op with fewer than three nodes.
Multi-vantage confirmation
Section titled “Multi-vantage confirmation”The peers don’t just watch each other - they can confirm your monitors’ outages from their vantage point. With
[health]id = "hora-a"confirm_with_peers = true # per-monitor override: confirm_with_peers = true/falsea monitor that confirms down locally triggers one concurrent round of probe requests to the peers before the alert goes out, and the alert carries the verdict:
- “confirmed down from 3/3 vantage points” - a real outage;
- “seen UP by hora-b - down from 1/3 vantage points (network issue near this node?)” - probably your fibre, not the service. The alert is softened, never silenced: geo-partial outages are real outages.
Two Raspberry Pi at two homes become a distributed Pingdom.
How a probe request works
Section titled “How a probe request works”The requester POSTs to the peer’s /api/peer/probe (derived from the
origin of ping_url), authenticated with the same token pair as heartbeats:
it presents its ping_token, which the responder checks against its
listen_token for that peer - required here; the id alone never
authorizes a probe.
The responder only probes targets present in its own configuration (matched on kind + target, probed with its own timeout, assertions and proxy) - never an arbitrary target, so a leaked token cannot turn a peer into an SSRF relay. Both nodes must therefore know the monitor, which pairs naturally with sharing the config in git.
Failure behaviour
Section titled “Failure behaviour”Strictly fail-open, by construction: peers being slow, unreachable, behind a wrong token or unaware of the target never block the alert, never delay it past a hard 10-second deadline (probes run concurrently), and never suppress it. The worst possible outcome is an alert without the vantage annotation - exactly what Hora sent before the feature. The incident record is written before the peers are even consulted.
External receivers
Section titled “External receivers”-
OUT-only peer - a plain dead-man to an external service:
[[peers]]id = "healthchecks"name = "healthchecks.io"ping_url = "https://hc-ping.com/<uuid>" -
Be watched by UptimeRobot - point a keyword monitor at this node’s
/healthzand match the keywordok(the top-levelstatusfield isokonly while the node is fully healthy). No peer entry needed.