Uptime Monitoring Guide | ServiceAlert.ai

Monitor types

Each type checks a different layer of your stack. Pick the one that matches the failure mode you actually want to catch — an HTTP check on the home page won't tell you that DNS is broken.

HTTP / HTTPS — fetches a URL and validates status code, response time, and optionally a keyword in the body. Supports GET / POST / HEAD / PUT, custom headers, and basic auth. The default for "is the site up?".
Ping (ICMP) — sends an ICMP echo to a host. Useful for raw network reachability where HTTP isn't exposed.
TCP port — opens a TCP connection to host:port. Use for databases, mail servers, MQTT brokers, anything non-HTTP.
SSL certificate — validates the cert chain and warns N days before expiry (configurable). Catches expired or about-to-expire certs before browsers do.
Domain expiration — reads the registrar's WHOIS expiry and warns N days before. Catches lapsed registrations, which the SSL cert check won't.
DNS — resolves an A / AAAA / MX / NS / CNAME / TXT / SOA record and optionally asserts the resolved value. Catches misconfigured records and unexpected mutations.
Heartbeat — inverse monitor. Instead of us pinging you, your cron job pings us. We alert if we don't hear from it within the expected window. Use it for backup jobs, ETL pipelines, scheduled reports.
Transaction (TXN) — multi-step Playwright flow: visit, fill, click, assert. Catches checkout breaks and login failures that an HTTP 200 won't reveal. Available on Business / Enterprise plans.

Creating a monitor

Go to /monitor → Add Monitor.
Pick a type. The form's lower sections (HTTP Settings, DNS Settings, etc.) reveal themselves based on the type.
Enter a name (free text) and the target — URL for HTTP, host for ping/TCP/SSL, domain for DNS/Domain.
(Optional) Pick a team from the dropdown to make the monitor shared. Personal monitors are private; team monitors are visible to all team members. Requires Business or Enterprise.
Set the check interval and retry policy (see below).
Save. The first check fires within ~30 seconds; results appear on the monitor's detail page.

Intervals, timeouts, and retries

Three knobs decide how aggressively a monitor checks — and how quickly a flap turns into an alert.

Check interval

Free — every 5 minutes
Pro — every 1 or 3 minutes
Business / Enterprise — every 30 seconds

Timeout

How long a single check can take before it's marked a failure. Default 30 seconds. Set lower if you want to alert on slow responses (e.g. 10s for a fast API), higher only if a check legitimately takes minutes (rare).

Retry threshold

How many consecutive failed checks before the monitor flips to down. Default 3. Higher values reduce false positives from transient network blips at the cost of slightly slower detection. Lower values catch issues faster but flap more.

Recommended starting point: 1-minute interval, 30-second timeout, 3 retries. That's ~3 minutes from incident start to alert, with very few false positives.

Alerts

Alerts fire on state changes — not on every failed check. Continued downtime won't re-page you; recovery does (when enabled).

Channels

Email — Pro, Business, Enterprise.
Slack, Microsoft Teams, Discord, Google Chat — Enterprise.
Webhooks — Enterprise. POSTs a JSON payload to your URL on every state change. Same shape regardless of monitor type.
PagerDuty / Opsgenie / Jira — Enterprise. Trigger / acknowledge / resolve incidents from monitor state changes.

What triggers an alert

By default, a monitor alerts when it transitions to down. Configure the Alert On picker to also fire on:

Degraded — soft failures (slow response, partial keyword match)
Partial outage — some regions failing
Major outage — all regions failing
Recovery — transition back to up
Maintenance — on entering/leaving a planned-maintenance window

Cooldown

If a monitor flaps repeatedly within a short window, the cooldown suppresses follow-up alerts on the same monitor for the same user. Default 30 minutes; configurable per channel. Recovery alerts ignore the cooldown.

Status pages

A status page is a public or private dashboard that shows the live state of selected monitors and any active incidents. Customers / stakeholders see it; you don't have to write outage updates by hand.

Go to /monitor/status-page-editor.
Pick the monitors to include and the layout.
Choose Public (anyone with the URL can view) or Private (login-gated).
Optionally point your subdomain at it (e.g. status.yourcompany.com) by adding a CNAME and entering the hostname in the editor.

Declared incidents are written manually and surface above the monitor grid. Use them for incident comms (root cause, ETA, postmortem link) when monitors alone don't tell the whole story.

On-call schedules

Route alerts to whoever's actually on rotation, not the whole team's inbox at 3am. Available on Enterprise.

Go to /monitor/on-call and create a schedule.
Add team members and define rotation length (daily / weekly / custom).
Optionally set an escalation policy: if the primary doesn't acknowledge within N minutes, page the secondary, then a manager, etc.
On any alerting monitor or team alert config, pick the schedule under Notify. Alerts then go to whoever is on call at the time of fire.

Maintenance windows

Planned downtime shouldn't page you. A maintenance window suppresses alerts and marks the monitor's state as maintenance on the status page so customers know it's intentional.

Go to /monitor/maintenance and click Schedule maintenance.
Pick the monitor(s), start time, duration, and optional description.
Optionally make it recurring (weekly deploys, monthly patching, etc.).

During the window, failed checks are recorded but don't trigger alerts and don't count against the monitor's uptime SLA. The status page shows the maintenance banner.

Heartbeat monitors (deep-dive)

Heartbeats invert the model: your code pings us, and we alert if we don't hear from it. Use them for things that don't expose an HTTP surface — nightly DB backups, ETL jobs, payroll runs, weekly digests.

Setup

Create a heartbeat monitor and set the expected interval (e.g. every 24h).
Copy the unique heartbeat URL shown on the monitor.
From your job, send a GET or POST to that URL on success. Curl works fine:

# In your cron job, after success
curl -fsS -m 10 https://servicealert.ai/api/heartbeat/<token> > /dev/null

If we don't see a ping within the expected interval (plus a small grace period), the monitor flips to down and alerts fire on whatever channels you've configured.

Tip: only ping after the job actually succeeded. Pinging unconditionally turns the heartbeat into "the host is alive", which a regular HTTP/Ping monitor already does.

Transaction monitors

An HTTP 200 on the home page doesn't tell you whether a real user can actually log in, add to cart, or check out. Transaction monitors run a scripted Playwright flow against your site every N minutes and alert when any step fails or the whole flow exceeds a duration budget.

Setup

Go to /monitor/transactions → New transaction.
Add steps: Visit URL, Fill field, Click selector, Assert text / status, etc. Each step gets a friendly name that appears in alert messages.
Set a per-step timeout and an end-to-end budget (alert if the full flow takes longer than X seconds).
Save. The first run kicks off within a minute; failures show which step broke and an attached screenshot.

Test users: create dedicated synthetic accounts for transactions so a failed login doesn't lock out a real customer. Mark synthetic traffic in your own analytics by passing ?utm_source=servicealert-synthetic in the first step's URL.

Reading the monitors dashboard

Eagle Eye hex grid — one tile per monitor, colour-coded by current state. Hover for the latest check; click to drill in.
Monitor detail — full check history, response-time graph, region-by-region breakdown, SSL chain (for HTTPS), recent incidents, current alert configuration.
Reports (/monitor/reports) — uptime / SLA summaries by month, exportable CSV. Use them for vendor SLA conversations and customer trust pages.
Incidents — auto-opened on transition to down, auto-closed on recovery. Manually-declared incidents (with comms) live alongside.