Observe Outage History

Uptime record, past incidents, and downtime history for Observe.

Checking current status...

90-Day Trend

Monthly Uptime

Month	Uptime	Days Tracked	Days with Issues
May 2026	28%	25	18
April 2026	100%	30	0
March 2026	90.3%	31	3
February 2026	100%	5	0

Uptime is calculated from daily worst-status snapshots. A day with any non-operational status counts as a day with issues.

Daily Status (Last 91 Days)

Feb 24 Today

Operational Degraded Partial Outage Major Outage Maintenance No Data

Incident History

May 2026

2026-05-13 - Monitor Evaluation Delays - US West (Oregon)

major

Started: May 14, 3:30 AM

monitoring

A fix has been implemented and we are monitoring the results

May 14, 6:12 AM

investigating

We are continuing to investigate the issue.

May 14, 5:04 AM

investigating

We are currently investigating an issue causing monitor evaluation delays in US West (Oregon). Monitor alerts may not fire normally. No data loss has occurred.

May 14, 3:30 AM

2026-05-12 Monitor emails not sending

Started: May 13, 1:40 AM

monitoring

A fix has been implemented and we are monitoring the results.

May 13, 3:45 PM

identified

The issue with outbound email delivery has recurred. Our team has re-engaged and is actively working on a resolution.

May 13, 9:36 AM

monitoring

A fix has been implemented and we are monitoring to confirm full recovery. Email delivery has been restored for new notifications going forward. Email alerts and scheduled reports triggered during this incident will not be delivered. We will provide a final update once we have confirmed the service is fully restored.

May 13, 5:33 AM

identified

We have identified the root cause of monitor email delivery failures affecting all Observe production regions and are working on a resolution.

May 13, 3:12 AM

investigating

We are investigating an issue affecting outbound email delivery from the Observe platform in all production regions. Monitor alert notifications, scheduled report deliveries, and other product emails are not being sent. Our team is actively investigating the root cause and working to restore normal performance as quickly as possible. We'll provide updates as we learn more.

May 13, 1:40 AM

2026-05-12 — Security Notification — Email Incident (All Regions)

major

Started: May 12, 5:27 PM

monitoring

Our investigation has concluded. The root cause was identified as a free trial tenant abusing a test email endpoint. The offending tenant has been disabled and we are deploying additional controls to prevent this type of misuse in the future. No customer data was compromised and the impact was limited to the unauthorized emails sent from this single vector. We apologize for the inconvenience and appreciate your patience.

May 12, 10:20 PM

monitoring

A fix has been implemented and we are monitoring the results.

May 12, 6:08 PM

investigating

We recently identified a security issue involving an unauthorized email sent from our domain. We are aware of this matter and are treating it with the highest priority. Our investigation is ongoing, and at this time we believe the scope of this incident was limited to a single vector, which has since been disabled. We have no indication of broader impact. We will provide an update as our investigation progresses. In the meantime, if you received any suspicious emails purporting to be from us,...

May 12, 5:27 PM

2026-05-08 AI SRE Query Failures (US-WEST)

major

Started: May 8, 2:46 PM

monitoring

A fix has been implemented and we are monitoring the results.

May 8, 3:20 PM

investigating

We are continuing to investigate this issue.

May 8, 3:07 PM

investigating

Some customers may experience failed or unresponsive AI SRE queries. Affected users may see errors or timeouts when submitting queries. The team is actively investigating.

May 8, 2:46 PM

March 2026

2026-03-25 High Rate of Ingest Errors (US West)

Started: Mar 26, 1:25 AM

monitoring

A fix has been implemented and we are monitoring the results.

Mar 26, 1:28 AM

investigating

We’ve identified an issue causing elevated errors for ingest endpoints in US West region. As a result, data ingest will be affected. Our team is actively investigating the root cause and working to restore normal performance as quickly as possible. We’ll provide updates as we learn more. Thank you for your patience!

Mar 26, 1:25 AM

2026-03-20 Intermittent ingest errors and query instability

critical

Started: Mar 21, 6:21 AM

monitoring

A fix has been applied and the system is recovering.

Mar 21, 6:33 AM

identified

The issue has been identified by Snowflake engineering teams and a mitigation is being applied.

Mar 21, 6:23 AM

investigating

We’ve identified an issue causing data ingest to have higher than normal errors and queries to be unreliable in the following regions: US-WEST-2. As a result, some users may experience higher than normal ingest errors and slow queries. We do not anticipate any data loss. This incident is related to an ongoing Snowflake incident - https://status.snowflake.com/. Our team is actively investigating the root cause and working to restore normal performance as quickly as possible. We’ll provi...

Mar 21, 6:21 AM

Unable to access Observe Tenant

Started: Mar 19, 12:20 AM

monitoring

We are aware that some users may be unable to log into Observe. The issue has been mitigated and we're monitoring.

Mar 19, 12:20 AM

Ingest performance degredation in Prod EU cluster

minor

Started: Mar 16, 6:55 PM

monitoring

A large increase in tracing data caused ingestion lag on the prod-eu-1 cluster, affecting multiple customers. Encoder replicas were scaled up and subsequently rolled to resolve unassigned partition workers. Memory pressure was identified as a contributing factor. Ingestion lag peaked at up to 45 minutes for some customers and has since recovered to near-normal levels. The team is continuing to monitor.

Mar 16, 6:55 PM

Service Degradation

minor

Started: Mar 16, 6:44 PM

investigating

A large increase in tracing data caused ingestion lag on the prod-eu-1 cluster. Encoder replicas were scaled up, but some partition workers were not reading data due to unassigned partitions. Encoders were rolled, and partition lag is now decreasing. The team is continuing to monitor recovery.

Mar 16, 6:44 PM

Performance Degradation in GCP

minor

Started: Mar 12, 2:51 PM

monitoring

The issue with elevated warehouse resume times in GCP has been mitigated. Query queueing times have returned to normal levels and resume times have stabilized. A case was raised with the infrastructure provider and warehouse resource management was adjusted to reduce customer impact during the incident. We are continuing to monitor to confirm the issue is fully resolved.

Mar 12, 5:15 PM

monitoring

We are continuing to monitor for any further issues.

Mar 12, 2:52 PM

monitoring

Our internal team identified an issue with warehouse resume times in Prod GCP - causing p99 queueing times of up to 2m (including user queries). The issue is believed to be addressed and we are monitoring on our side.

Mar 12, 2:51 PM

View live status for Observe Browse all services