Grafana Cloud Outage History

Uptime record, past incidents, and downtime history for Grafana Cloud.

Checking current status...

90-Day Trend

Monthly Uptime

Month	Uptime	Days Tracked	Days with Issues
July 2026	33.3%	9	6
June 2026	3.3%	30	29
May 2026	0%	31	31
April 2026	0%	21	21

Uptime is calculated from daily worst-status snapshots. A day with any non-operational status counts as a day with issues.

Daily Status (Last 91 Days)

Apr 10 Today

Operational Degraded Partial Outage Major Outage Maintenance No Data

Incident History

July 2026

PDC Degraded performance

critical

Started: Jul 9, 11:37 AM

monitoring

A fix has been applied and we are seeing recovery. We will continue to monitor this.

Jul 9, 1:04 PM

investigating

We are still currently investigating the issue.

Jul 9, 12:36 PM

investigating

We are currently facing performance degradation on PDC service hosted on Multiple clusters. Our Engineering Team is currently working on fixing the issue, we do apologize for any inconvenience.

Jul 9, 11:37 AM

Delayed ingestion and recording rule evaluation failures for Mimir in prod-ap-south-1

minor

Started: Jul 9, 10:28 AM

monitoring

A fix has been implemented. We are currently monitoring the results.

Jul 9, 11:18 AM

investigating

We are observing delayed ingestion and recording rule evaluation failures for Mimir in prod-ap-south-1. As of yet we have not noticed any customer impact however we are currently observing the cell.

Jul 9, 10:28 AM

Grafana rulers crash-looping on prometheus

minor

Started: Jul 8, 12:17 PM

monitoring

We are in the process of rolling out the fix. Our engineers are monitoring the progress.

Jul 8, 2:23 PM

investigating

We are investigating issues with the grafana-ruler service on prod-us-east-2 and prod-us-west-0 which are causing periodic crash conditions. A code fix is currently being deployed to mitigate this

Jul 8, 12:17 PM

Cannot access dashboard set up as a home page

minor

Started: Jul 8, 10:28 AM

investigating

We are currently investigating an issue affecting some customers who are unable to load or access their dashboards when configured as their home page. We will update once we have more information on this.

Jul 8, 10:28 AM

Grafana Cloud IRM alert groups failing to produce alerts in us-east-3 region.

major

Started: Jul 5, 11:37 AM

monitoring

The work on bringing services back to healthy state is completed and alerts should be created correctly, without a delay now. We're keeping the incident open and monitoring for any potential hiccups that might occur.

Jul 5, 1:38 PM

identified

Some IRM alert groups in us-east-3 region may not be able to produce alerts and could be sluggish/misbehaving in general. The issue started around 01:00 UTC on July 3rd. The issue has been identified and the cause of the issue has been fixed. Our team is actively working on getting the service back to healthy state.

Jul 5, 11:37 AM

Some Queries Failing

minor

Started: Jul 2, 6:10 PM

identified

We are continuing to work on rolling back a PR responsible for this behavior. Once we have more information, we will share it here. Thank you for your patience.

Jul 2, 8:16 PM

identified

Queries with drop __error__, including log volume histogram queries (the queries that generate the histogram visualization in Grafana), are failing due to a bug with series limit checks. The Root cause has been identified, and we are working on a fix.

Jul 2, 6:10 PM

Loki and Frontend Observability - Major Outage in prod-us-central-0 region

critical

Started: Jul 2, 4:55 AM

investigating

The AWS Logs integration in the same region is affected as well. We will provide further updates as our investigation progresses.

Jul 2, 4:57 AM

investigating

We are currently investigating a major outage in Loki writes and Frontend Observability in the prod-us-central-0 region. Our Engineering team is investigating this and we will provide further updates as our investigation progresses.

Jul 2, 4:55 AM

Elevated Loki Query Bytes Reporting

minor

Started: Jul 1, 7:43 PM

investigating

We are investigating an issue where some customers may see higher Loki query byte usage reported than was actually consumed. This affects usage reporting only; there is no impact to query execution or service availability. The issue began at approximately 13:20 UTC and is ongoing. We expect the issue to be resolved soon and will provide another update as more information becomes available.

Jul 1, 7:43 PM

June 2026

Confluent API Outage

major

Started: Jun 29, 12:37 PM

investigating

We are investigating an issue affecting Confluent metrics ingestion across all regions. Due to an elevated error rate on the Confluent side, some metrics may not be ingested, resulting in potential data loss. We are actively investigating the issue and will provide updates as more information becomes available.

Jun 29, 12:37 PM

Mimir read errors and high latency in prod-eu-west-0

minor

Started: Jun 29, 11:37 AM

identified

We've identified a possible cause, and a mitigation is in place to prevent further occurrences.

Jun 29, 5:47 PM

investigating

The errors and latency have now recovered, we continue investigating the root cause.

Jun 29, 3:37 PM

investigating

The errors are recovering, and we are still looking into the root cause of this.

Jun 29, 1:52 PM

investigating

We are currently investigating an issue with Mimir in prod-eu-west-0 we are seeing read errors and high latency. This incident is currently ongoing. The errors are recovering but we are currently looking into the route cause of this.

Jun 29, 11:37 AM

Rule evaluation error on cluster prod-gb-south-0

minor

Started: Jun 29, 10:34 AM

monitoring

A fix has been applied and we are currently monitoring results.

Jun 29, 11:14 AM

investigating

We are currently investigating Rule Evaluation errors on the cluster prod-gb-south-0 which is leading to error codes showing within the stacks. We are looking into the issue and will update accordingly.

Jun 29, 10:34 AM

K6 - Test run metrics processing is delayed

minor

Started: Jun 26, 1:08 PM

investigating

We've improved the metric ingestion delay time and are working on additional fixes to bring it down to expected range. Customers can currently expect a delay of 2 to 5 minutes before their test run metrics show up (after starting a test).

Jun 26, 11:30 PM

investigating

Update: Changed incident title to "Test run metrics processing is delayed" We have found the issue and are working on deploying the fix.

Jun 26, 4:33 PM

investigating

We are experiencing intermittent delays with secondary metrics processing for k6 Cloud test runs due to heavy load. We don't expect any data loss or impact on user runs, but results may take longer time to appear in UI.

Jun 26, 2:09 PM

investigating

A small update: The issue is isolated to new test runs, and users can go see the metrics of all the previous test runs

Jun 26, 1:19 PM

investigating

We’re currently investigating an issue causing metrics not to appear during test runs. . Our team is actively working to identify the cause. Thank you for your patience.

Jun 26, 1:08 PM

Rule Evaluation Outage in prod-us-central-0

major

Started: Jun 19, 4:39 PM

monitoring

We’re continuing to track progress post-mitigation. While we don’t have new information to share yet, our team remains actively engaged.

Jun 19, 5:46 PM

monitoring

We had an outage affecting rule evaluations between 15:16-15:59 UTC in the prod-us-central-0 region. Our team quickly identified the issue and has since mitigated. The engineering team is monitoring.

Jun 19, 4:39 PM

Issues with actions in the Grafana IRM mobile app

major

Started: Jun 18, 5:01 PM

monitoring

We've verified a fix in our staging environment to restore functionality to the mobile app. The fix is currently being deployed to production. Thanks for your patience as we continue to roll this out and monitor the resolution.

Jun 18, 6:09 PM

identified

We're noticing an uptick in users being unable to respond to actions on the mobile app (acknowledging and silencing alerts, for example). Users working in the web UI should not be affected. Ingestion and notification delivery are working as expected. We have a fix in place and are in the process of deploying.

Jun 18, 5:01 PM

Potential Issues Loading Grafana for Users in India

major

Started: Jun 18, 3:18 PM

monitoring

Error rates have remained near zero, and we continue to monitor.

Jun 23, 5:54 PM

monitoring

We are continuing to monitor for further issues.

Jun 23, 1:15 PM

monitoring

We have deployed additional mitigations that should help with remaining errors. We are continuing to monitor error rates.

Jun 23, 9:14 AM

monitoring

We’ve verified and begun to implement a fix that will improve loading errors. We are continuing to roll this out to all regions and monitor for efficacy.

Jun 22, 5:16 PM

monitoring

We're actively monitoring this issue and working with our 3rd party provider. The next update will be sent on Monday unless there's new information to share.

Jun 19, 7:07 PM

monitoring

Due to the linked GCP outage below, users located in India may have trouble loading parts of Grafana. https://status.cloud.google.com/incidents/5fGQt4VbkDnr3Yp8PXPr We are continuing to work with our CSP on this investigation. Impacted users may receive intermittent error messages such as "Error Loading" or "Failed to load Assets". To be clear, it does not matter the region the stack is located, but the geography where the user is physically in.

Jun 19, 2:33 AM

monitoring

Due to the linked GCP outage below, users located in India may have trouble loading parts of Grafana. https://status.cloud.google.com/incidents/5fGQt4VbkDnr3Yp8PXPr Impacted users may receive intermittent error messages such as "Error Loading" or "Failed to load Assets". To be clear, it does not matter the region the stack is located, but the geography where the user is physically in. We continue to work with our CSP on this investigation.

Jun 18, 4:57 PM

investigating

Due to the linked GCP outage below, users located in India may have trouble loading parts of Grafana. https://status.cloud.google.com/incidents/5fGQt4VbkDnr3Yp8PXPr Impacted users may receive error messages such as "Error Loading" or "Failed to load Assets". To be clear, it does not matter the region the stack is located, but the geography where the user is physically in. We are currently investigating this issue from our end, and will provide updates as they are available.

Jun 18, 3:18 PM

Degraded k6 cloud UI performance

critical

Started: Jun 18, 11:25 AM

monitoring

We are continuing to monitor for any further issues.

Jun 18, 2:04 PM

monitoring

The root cause of the issue has been identified and a fix has been successfully deployed. We are observing widespread improvements across all systems. Our team is currently monitoring the environment to ensure performance remains stable.

Jun 18, 1:15 PM

investigating

We are continuing to investigate this issue.

Jun 18, 11:45 AM

investigating

We’re currently investigating an issue resulting in degraded k6 cloud UI performance and API response time. Our team is actively working to rectify this issue.

Jun 18, 11:25 AM

Frontend Observability - Suspected commit feature not working as expected

minor

Started: Jun 18, 8:48 AM

investigating

We’re currently investigating an issue affecting Frontend Observability product. The "Suspected commit" feature is not currently working as expected. Ingestion and querying is unaffected by this. Our team has identified the cause and is actively working on a fix. Thank you for your patience.

Jun 18, 8:48 AM

Loki data source-managed alert rules not visible in the Grafana Cloud Alerting UI

major

Started: Jun 17, 8:17 PM

monitoring

A fix has been implemented and we are monitoring the results.

Jun 18, 8:04 AM

identified

We are continuing to deploy the fix and monitor recovery efforts. As part of the rollout, we identified an issue that required adjustments to our deployment plan, which has extended the timeline for mitigation. Work remains actively underway, and we will share additional updates as progress continues.

Jun 17, 11:22 PM

identified

Deployment of the fix is still in progress. We are continuing to monitor the rollout and validate recovery across affected systems. We will share further updates as they become available.

Jun 17, 9:43 PM

identified

Our Engineering Team has implemented a fix which is now being rolled out. We will continue to monitor the situation and update as soon as we have more information.

Jun 17, 8:55 PM

identified

We have identified an issue where alert rules and alerts managed directly in a Loki data source (data source-managed alerting) are not displayed in the Grafana Cloud Alerting UI. Rules created via Prometheus/Mimir data sources and Grafana-managed alert rules are not affected. Impact is limited to visibility and management in the UI. Affected alert rules continue to evaluate and send notifications normally — there is no impact to alert delivery. Workaround: Loki alert rules can still be vi...

Jun 17, 8:17 PM

Grafana Dashboards page not displaying when set to ‘View by Folders’

major

Started: Jun 10, 10:49 AM

investigating

We’re currently investigating an issue affecting The Grafana Dashboards page. When set to view by folders, is currently experiencing an issue where no dashboards are shown. Our team is working on fixing the problem. In the meantime, switching to ‘View as list’ allows access to dashboards as usual”.

Jun 10, 10:49 AM

Investigating Issues with Data Source-Managed Alerting

major

Started: Jun 9, 11:11 PM

monitoring

Our team has implemented a fix and we are currently monitoring the results of this.

Jun 10, 10:57 AM

investigating

We are currently investigating an issue affecting data source-managed alerting management functionality in Grafana Cloud. Customers may experience problems viewing, creating, updating, or managing alerts through Grafana when using data source-managed alerting. This issue is limited to alert management functionality within Grafana. Alert evaluation and backend alerting services continue to operate normally. Direct alerting APIs for Mimir and Loki remain fully operational and are unaffected. ...

Jun 9, 11:11 PM

IRM Degraded Performance

minor

Started: Jun 8, 10:45 AM

monitoring

We've released a fix to the IRM app that should restore service for affected customers with issues related to labels. Thanks for your patience while investigating. We're continuing to monitor as we confirm the resolution in place.

Jun 8, 5:11 PM

identified

We are continuing to work on a fix for this. To further clarify, this issue is not about accessing IRM or alert ingestion/notification/delivery, but rather with handling labels.

Jun 8, 3:21 PM

identified

The degraded performance is about labels, and we have seen this degradation in more regions.

Jun 8, 1:14 PM

identified

We are continuing to work on a fix for this issue.

Jun 8, 12:26 PM

identified

The issue has been identified and a fix is being implemented.

Jun 8, 11:15 AM

investigating

We are continuing to investigate this issue.

Jun 8, 10:47 AM

investigating

We are experiencing access issues in IRM as there are elevated 500 API responses in prod-us-central-0.

Jun 8, 10:45 AM

Brief Rule Evaluation Failures in prod-eu-west-3

major

Started: Jun 7, 1:39 AM

monitoring

The incident has been mitigated, and services are operating normally. We continue to monitor the service to ensure full stability.

Jun 8, 9:36 PM

monitoring

The incident has been mitigated, and services are operating normally. We are currently monitor the service to ensure full stability.

Jun 7, 11:00 AM

investigating

We’re making ongoing progress on the investigation alongside our upstream provider.

Jun 7, 6:00 AM

investigating

We are continuing to investigate this issue.

Jun 7, 3:50 AM

investigating

Intermittent spikes in rule evaluations continuing.

Jun 7, 2:46 AM

investigating

From 00:20:00 to 00:27:00 and again 00:32:00 to 00:38:00 there were brief spikes in rule evaluation failures. Engineers are investigating.

Jun 7, 1:39 AM

Permissions Issues with IRM

critical

Started: Jun 5, 4:11 PM

monitoring

Continuing to monitor progress. Most customers affected should have all services restored, with a few remaining customers receiving updates as the rollout finishes out. Thanks again for your patience.

Jun 5, 8:10 PM

monitoring

A fix has been released to prod and rolling out across the fleet for IRM, restoring access to affected customers. Thanks for your patience through this work. We're continuing to monitor to confirm we've returned to a steady state.

Jun 5, 7:34 PM

monitoring

We've identified an earlier regression in one of our recent code changes that was affecting resolution of our previous fix. We're deploying this change now and applying a hot fix in the interim to restore access quickly

Jun 5, 6:40 PM

monitoring

A fix is being deployed now, and we are monitoring the progress.

Jun 5, 6:00 PM

identified

We've identified an issue with RBAC, and are working on a fix to restore permission services for those affected.

Jun 5, 5:14 PM

investigating

Our engineering team is still investigating this issue. We do not have any new information to share at this time, but will continue to provide timely updates.

Jun 5, 4:52 PM

investigating

We are currently investigating an issue impacting permissions for IRM. As a result, users are not currently getting paged. We will provide updates as they become available.

Jun 5, 4:11 PM

Silences not Working as Expected

major

Started: Jun 5, 2:02 PM

identified

We have identified an issue causing Silences to not work as expected in the Cloud (Mimir) Alertmanager. Grafana Alertmanager is working ok, this is only affecting Data source-managed alerts.

Jun 5, 2:02 PM

Grafana Assistant Skills Page Blank

major

Started: Jun 4, 4:44 PM

identified

The issue has been identified, and we are working on a fix.

Jun 4, 5:00 PM

investigating

We are continuing to investigate this issue.

Jun 4, 4:56 PM

investigating

We are currently investigating an issue affecting the Skills page of Grafana Assistant. Impacted deployments will encounter a blank screen when attempting to access this page. At this time, we have observed partial impact in the us-east-0 and us-central-0 regions, and will provide an update here if the scope of impact expands.

Jun 4, 4:44 PM

K6 Test Runs Degraded Performance

minor

Started: Jun 3, 8:40 PM

monitoring

We have applied a fix, and are monitoring the results.

Jun 3, 9:23 PM

investigating

We are currently investigating an issue causing k6 test runs to take longer than expected to complete, or to time out within Grafana Cloud.

Jun 3, 8:40 PM

Synthetic Scripted/Browser checks failure

major

Started: Jun 3, 11:38 AM

identified

We are in the process of deploying a fix for this issue.

Jun 3, 5:07 PM

investigating

We’re currently investigating an issue affecting Synthetic Monitoring where updates for Scripted/Browser checks might fail. Our team is actively working to identify the cause. Thank you for your patience.

Jun 3, 11:38 AM

tempo prod-25 write-path-down

minor

Started: Jun 2, 11:15 PM

identified

Between 21:20 and 22:40 UTC, writes to tempo-prod-25 failed due to an outage. tempo-prod-24 was also affected during an overlapping window from 22:32 to 22:40 UTC."

Jun 2, 11:15 PM

Alert manager unavailable in prod-us-central-0

minor

Started: Jun 1, 7:53 PM

monitoring

A fix has been implemented and we are monitoring the results.

Jun 1, 8:02 PM

identified

Starting at 18:30 UTC, we noticed alert manager unavailability limited to prod-us-central-0 which affects grafana-managed and datasource-managed alerting, causing disruption to updating alertmanager config and limited disruption to alert sending. We have identified the cause and are in the process of remediation.

Jun 1, 7:53 PM

May 2026

Grafana Loki Log Query Issues

major

Started: May 29, 9:03 AM

monitoring

We have identified the cause of this incident and a fix has been applied. Normal functions are returning. We are currently monitoring the recovery process.

May 29, 10:28 AM

investigating

We’re currently investigating an issue affecting Loki queries in Grafana. We have had reports from customers showing the logs are not loading or showing missing logs. Our team is actively working to identify the cause. Thank you for your patience.

May 29, 9:03 AM

Prometheus Datasource Errors/Outage in prod-us-east-0

major

Started: May 27, 8:22 PM

investigating

We are seeing recovery across affected Prometheus datasources, and error rates have significantly improved. The service is recovering without any required customer action, and our team continues to monitor stability while we investigate the underlying cause. We’ll provide another update as we learn more.

May 27, 9:57 PM

investigating

We continue to investigate an issue affecting Prometheus datasources causing intermittent timeouts and unexpected errors, primarily impacting alert rule evaluations. Our team is actively working to identify the cause. Thank you for your patience.

May 27, 9:36 PM

investigating

We’re currently investigating an issue affecting Prometheus datasources causing 500 internal or Unexpected errors. Our team is actively working to identify the cause. Thank you for your patience.

May 27, 8:22 PM

Grafana K6 metrics processing and test runs degradation

minor

Started: May 18, 8:24 AM

monitoring

We've stabilized the system and test runs no longer result in timeout. There is a small delay (a few minutes) in processing metrics at the end of the test run, but most users shouldn't be too negatively impacted by that. We expected the delay/lag to also resolve within the next 30-60 minutes.

May 18, 2:30 PM

investigating

We have identified that test runs are getting timed out as a result of the issue This issue first occurred on May 05/15/2026 at 8:00PM UTC.

May 18, 10:27 AM

investigating

We’re currently investigating an issue that is resulting in degraded performance in metrics processing and test run metrics may take longer than usual to show up. Our team is actively working to identify the cause. Thank you for your patience.

May 18, 8:24 AM

Intermittent Errors and High latency Writing to Cloud Metrics, Cloud Logs and Cloud Traces

minor

Started: May 13, 8:50 AM

monitoring

We continue to see signs of recovery and improved stability across impacted services. Our teams continue to closely monitor the situation while working with the cloud provider.

May 13, 9:10 PM

monitoring

We continue to see signs of recovery and improved stability across impacted services. Our teams continue to closely monitor the situation while working with the cloud provider.

May 13, 3:41 PM

monitoring

We are seeing signs of recovery and improved stability across impacted services over the past hour. Our teams continue to closely monitor the situation while working with the cloud provider.

May 13, 1:37 PM

investigating

We have identified expanded impact affecting Grafana Cloud Logs and Grafana Cloud Traces in addition to Cloud Metrics, causing intermittent errors and increased latency when writing data. Our teams continue working on a fix and investigating the issue with the cloud provider’s support team.

May 13, 10:25 AM

investigating

We’re continuing to investigate the issue causing intermittent errors and high latency when writing to Cloud Metrics. We are in contact with the cloud provider’s support team, and they are investigating the issue alongside us.

May 13, 10:01 AM

investigating

We’re currently investigating an issue causing intermittent errors and high latency when writing to Cloud Metrics. Our team is actively working to identify the cause. Thank you for your patience.

May 13, 8:50 AM

"Failed to Load Dashboard" Errors

major

Started: May 11, 9:38 PM

identified

The fix is currently being rolled out to all impacted environments.

May 12, 2:13 PM

identified

Our teams continue working on a fix for this issue. We do not have additional information to share at this time, but we will continue to provide updates as progress is made.

May 12, 11:11 AM

identified

We are continuing to work on a fix for this issue. While we do not have additional updates to share at this time, our teams remain actively engaged and we will provide further updates as soon as they become available.

May 12, 8:58 AM

identified

Customers on Grafana Cloud may see an error on dashboard panels with "Failed to load dashboard ... json unmarshal number ...". We have identified the issue and are working to deploy out the fix.

May 11, 9:38 PM

SSL/TLS Connectivity Issues

major

Started: May 11, 8:49 PM

investigating

We are currently investigating reports of service disruption affecting a subset of customers. Customers may experience intermittent connectivity issues, degraded performance, or SSL/TLS certificate validation errors when accessing affected services. Our engineering teams are actively working to identify the scope of impact and restore full functionality as quickly as possible. We will continue to provide updates as more information becomes available.

May 11, 8:49 PM

Cloud Metrics -High Write Latency and Errors in prod-us-central-7

minor

Started: May 8, 9:16 PM

monitoring

From approximately 20:40-21:00 UTc, we experienced an issue affecting Grafana Cloud Metrics in prod-us-central-7. Affected users may have experienced high latency and/or errors during ingestion and rule evaluation. Our team has identified the cause and mitigated. We are currently monitoring for long-term stability.

May 8, 9:16 PM

Metrics read errors in prod-ap-south-1 region

critical

Started: May 7, 7:18 AM

monitoring

Engineering has released a fix and as of 07:50 UTC, customers should no longer experience errors when querying metrics. We will continue to monitor for recurrence and provide updates accordingly.

May 7, 7:53 AM

investigating

From approximately 06:24 UTC, we were alerted to an issue with read errors in mimir-prod-43. Users with instances hosted in the prod-ap-south-1 region experiencing this issue may encounter an error message when querying metrics. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

May 7, 7:18 AM

Datasource Query Performance Issues

minor

Started: May 6, 8:07 PM

investigating

We’re currently investigating an issue affecting Datasource query performance in prod-us-east-4. Our team is actively working to identify the cause. Thank you for your patience.

May 6, 8:07 PM

Elevated Error Rate of Browser Checks in PoP Oregon

minor

Started: May 5, 4:11 PM

monitoring

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

May 5, 7:44 PM

identified

We’ve identified the cause of the issue impacting browser checks. Our team is currently implementing a fix.

May 5, 6:13 PM

investigating

We’re currently investigating an issue affecting browser checks in the PoP Oregon region. Our team is actively working to identify the cause. Thank you for your patience.

May 5, 4:11 PM

k6 Partial Outage

major

Started: May 4, 10:58 PM

monitoring

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

May 5, 12:04 AM

investigating

After further investigation, this issue may also be affecting Synthetic Monitoring. We continue to identify the cause and will update as soon as we have more information.

May 4, 11:23 PM

investigating

We’re currently investigating an issue affecting k6. Our team is actively working to identify the cause. Thank you for your patience.

May 4, 10:58 PM

Ingestion Errors for AWS Cloud Provider Observability Metric Streams in prod-us-central-7

major

Started: May 1, 9:14 AM

monitoring

A fix has been implemented and we are monitoring the results.

May 1, 9:43 AM

investigating

We are continuing to investigate this issue.

May 1, 9:42 AM

investigating

We are investigating an issue with ingesting Metrics for AWS Cloud Provider Observability with Metric Streams. Users experiencing this issue may encounter ingestion errors in the "prod-us-central-7" region only starting from ~06:30UTC. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

May 1, 9:14 AM

April 2026

Investigating Issues Saving SQL Datasource Credentials

minor

Started: Apr 28, 6:46 PM

monitoring

We’ve identified the cause of the issue impacting SQL datasources. Our team is currently implementing a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

Apr 28, 6:59 PM

investigating

We are currently investigating reports of issues affecting SQL-based data sources where users are unable to save credentials. This appears to impact a subset of customers and may be occurring across multiple regions. We are actively working to determine the scope and root cause. We will provide updates as more information becomes available.

Apr 28, 6:46 PM

Gateway Slowness Detected in Prod (US-East-1)

minor

Started: Apr 28, 9:20 AM

investigating

Successful requests have dropped, users may not be able to access their instances.. The issue is under investigation.

Apr 28, 9:20 AM

InfluxDB Datasource - Intermittent Failures

major

Started: Apr 27, 5:08 PM

monitoring

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

Apr 27, 11:13 PM

identified

We’ve identified the cause of the issue impacting the InfluxDB datasource. Our team is currently implementing a fix.

Apr 27, 6:01 PM

investigating

We’re currently investigating an issue affecting the InfluxDB plugin. Some users may see intermittent failures. Our team is actively working to identify the cause. Thank you for your patience.

Apr 27, 5:08 PM

Cloudwatch Datasource Outage

major

Started: Apr 23, 2:26 PM

monitoring

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

Apr 23, 2:39 PM

investigating

We’re currently investigating an issue affecting Cloudwatch datasources. Our team is actively working to identify the cause. Thank you for your patience.

Apr 23, 2:26 PM

Restrictions on Alerts & Reports for Grafana Cloud Free/Trial Users

minor

Started: Apr 20, 9:12 PM

monitoring

Grafana Labs is implementing measures to safeguard the Grafana Cloud platform against ongoing unauthorized use while preserving the capabilities relied upon by our community. Effective immediately, we have made the following modifications to the platform: Alerting Email alerting has been disabled for new Grafana Cloud Free and Trial accounts; however, all other integrations such as webhooks remain functional. Additionally, Cloud Alertmanager is now disabled for Grafana instances in these acc...

Apr 22, 3:03 PM

monitoring

We are continuing to monitor for any further issues.

Apr 20, 10:07 PM

monitoring

Grafana Labs is taking steps to safeguard our Grafana Cloud platform against unauthorized use while maintaining the Grafana Cloud Free and Trial tiers of service our users and the community have come to rely on. As of Monday April 20, alerting and reporting capabilities have been disabled in new Grafana Cloud Free and trial stacks. We are working towards deploying improvements and restoring those functionalities in a way that keeps our platform secure and open for all of our users.

Apr 20, 9:12 PM

Elevated 429 Errors Impacting Metrics Querying Across Multiple Regions

critical

Started: Apr 20, 2:09 PM

investigating

The issue is now confirmed to be widespread, affecting Prometheus across all regions. Customers may continue to experience elevated 429 (rate limit) errors, particularly when querying metrics, with failures or inconsistent responses possible. Our engineering team remains fully engaged and is actively working on mitigation and resolution efforts with the highest priority.

Apr 20, 2:21 PM

investigating

We are currently experiencing a major incident causing elevated 429 (rate limit) errors across multiple regions, primarily impacting metrics querying. This is a high-priority issue, and our engineering team is actively engaged and working urgently to identify the root cause and restore full service as quickly as possible. Customers may experience widespread failures or delays when querying metrics during this time. We understand the significant impact this may have and will continue to prov...

Apr 20, 2:09 PM

Query Caching - Degraded Performance

minor

Started: Apr 17, 9:23 PM

monitoring

Currently prod-us-east-0 and prod-eu-west-3 have recovered, and we are continuing to monitor prod-us-central-0 which is in the process of recovery.

Apr 17, 10:09 PM

investigating

As of 20:52 UTC, we are currently investigating degraded Query Caching performance in multiple regions. For datasources where query caching is configured, some queries may take longer than usual. Our team is actively working to identify the cause. Thank you for your patience.

Apr 17, 9:23 PM

Issues on Stack creation

minor

Started: Apr 16, 12:52 PM

monitoring

The issue is fixed and we are currently monitoring the service.

Apr 16, 1:19 PM

identified

Since today 16th at ~12:11UTC we are seeing issues on stack creation across all our regions. Customers will experience error message when attempting to create a stack. Our engineering team has identified the source of the issue as external to Grafana (provider), and they are tracking its recovery.

Apr 16, 12:52 PM

Degraded Ticket Visibility in Support System

minor

Started: Apr 15, 4:07 PM

monitoring

We are currently experiencing an issue with our ticketing system provider that is affecting how tickets appear within our internal support views. We are continuing to receive all new tickets successfully, and no requests are being lost at this time. Our team is actively monitoring the situation and working to ensure all incoming requests are reviewed, including those that may not be immediately visible in standard views. We will provide further updates as we receive more information from o...

Apr 15, 4:07 PM

K6 Sporadic DNS Issues

minor

Started: Apr 14, 9:22 AM

monitoring

Our engineering team has deployed a fix and we are currently monitoring the behaviour of the system until full resolution.

Apr 14, 2:29 PM

monitoring

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

Apr 14, 2:29 PM

identified

We are having sporadic DNS issues that occasionally affect the start of cloud test runs, causing them to abort. We are currently working to resolve. The issue has been occurring since April 9.

Apr 14, 9:22 AM

Grafana Cloud Logs - Write degradation in us-east-3

major

Started: Apr 10, 11:53 PM

investigating

We are seeing issues on the write path for Loki in cluster in us-east-3, and we are actively investigating this issue.

Apr 10, 11:53 PM

Tempo Write Outage

major

Started: Apr 10, 7:42 PM

monitoring

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time. We’ll update again within an hour.

Apr 10, 7:53 PM

investigating

We are currently investigating a write outage affecting prod-us-east-3. The issue began at 18:50 UTC. Users may experience errors, timeouts, or unavailability while we work to identify the cause and restore service.

Apr 10, 7:42 PM

K6 Browser Testing/Timeline Not Available

minor

Started: Apr 9, 5:34 PM

investigating

We’re currently investigating an issue affecting browser testing. Users running browser tests will not be able to see the browser timeline. Our team is actively working to identify the cause and will share an update within two hours. Thank you for your patience.

Apr 9, 5:34 PM

Unable to Edit Notification Policies

minor

Started: Apr 7, 3:17 PM

identified

We’ve identified the cause of the issue impacting notification policies. Our team is currently implementing a fix. We’ll provide another update in 2 hours or sooner if the situation changes.

Apr 7, 6:03 PM

identified

We’ve identified the cause of the issue impacting notification policies. Our team is currently implementing a fix. We’ll provide another update in 2 hours or sooner if the situation changes.

Apr 7, 4:52 PM

investigating

We’re currently investigating an issue affecting notification policies. Our team is actively working to identify the cause and will share an update within 2 hours. Thank you for your patience.

Apr 7, 3:17 PM

Notification Policies and Contact Points Missing in UI on the Slow Release Channel

minor

Started: Apr 6, 2:48 PM

monitoring

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time. We’ll update again within 2 hours.

Apr 6, 11:58 PM

identified

We’ve identified the cause of the issue impacting the Notification Policy and Contact Point UI. Our team is currently implementing a fix. We’ll provide another update when the fix is deployed and we monitor the expected improvement.

Apr 6, 9:04 PM

investigating

We’re continuing to investigate the issue with the alerting UI. While we don’t have new information to share yet, our team is working to identify the root cause. Next update in 2 hours.

Apr 6, 4:13 PM

investigating

We’re currently investigating an issue affecting notification policies and contact points for instances on the slow release channel. Alerting API calls for contact points and notification policies return data as expected, so this appears to be limited to the UI. Our team is actively working to identify the cause and will share an update within 1-2 hours. Thank you for your patience.

Apr 6, 2:48 PM

Partial K6 Test Run Outage

major

Started: Apr 3, 3:29 PM

investigating

We're experiencing an outage affecting test runs that use k6 extensions. The issue prevents users from executing these types of test runs both locally and in Grafana Cloud. Test runs that do not use extensions are not affected by this incident.

Apr 3, 3:29 PM

AWS integration Degraded Performance

minor

Started: Apr 1, 8:17 PM

investigating

We are investigating a noticeable drop in active series for the AWS integration that began around 18:15 UTC. This issue may cause scrapes to hit rate limits, which can result in individual data points not being collected for the serverless integration. The impact is intermittent and may affect any customer using the AWS integration, regardless of region. We are currently working to identify the cause and will provide an update as soon as we have more information.

Apr 1, 8:17 PM

Query degradation and possible rule evaluation failure on prod-eu-west-0.cortex-prod-01

minor

Started: Apr 1, 9:56 AM

monitoring

A fix has been implemented and we are monitoring the results.

Apr 1, 10:12 AM

investigating

We are continuing to investigate this issue.

Apr 1, 10:11 AM

investigating

We are currently observing delays in ingesting data, possibly causing partial query results and failed rule evaluations for prod-eu-west-0.cortex-prod-01 metrics cell.

Apr 1, 9:56 AM

March 2026

Some of the CloudWatch queries are failing

major

Started: Mar 31, 9:48 AM

monitoring

We are continuing to monitor for any further issues.

Mar 31, 9:49 AM

monitoring

Some of the CloudWatch queries were failing. Started at 08:37 UTC Monitoring from 09:21 UTC

Mar 31, 9:48 AM

Some Grafana Instances Unavailable

major

Started: Mar 27, 1:36 PM

monitoring

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time. We’ll update again in 1 hour.

Mar 27, 8:16 PM

identified

We’ve identified the cause of the issue impacting the instances. Our team is currently implementing a fix. We’ll provide another update in 1–2 hours, or sooner, if the situation changes.

Mar 27, 6:10 PM

investigating

We’re continuing to investigate the issue with Grafana instances. While we don’t have new information to share yet, our team is working to identify the root cause. Next update in 1-2 hours.

Mar 27, 4:36 PM

investigating

We’re continuing to investigate the issue with Grafana instances. While we don’t have new information to share yet, our team is working to identify the root cause. Next update in 1-2 hours.

Mar 27, 2:51 PM

investigating

We’re currently investigating an issue which is affecting primarily users on the Free tier. Impacted users will be met with a "your Grafana instance is loading" message indefinitely. Our team is actively working to identify the cause and will share an update within 1-2 hours. Thank you for your patience.

Mar 27, 1:36 PM

Prometheus writes in prod-eu-west-3 are degraded

critical

Started: Mar 25, 2:11 PM

monitoring

We are continuing to monitor for any further issues.

Apr 20, 3:08 PM

monitoring

We have deployed mitigation and seen improvement in write failures over the past week. We are still seeing intermittent spikes in latency and continue to monitor.

Apr 14, 8:11 PM

monitoring

We are still seeing intermittent issues and continue to seek a resolution

Apr 8, 8:32 PM

monitoring

We are continuing to monitor for any further issues.

Apr 2, 9:38 PM

monitoring

We are continuing to monitor this through the weekend.

Mar 27, 9:05 PM

monitoring

We are continuing to monitor the previously impacted environments.

Mar 26, 5:45 PM

monitoring

A fix has been implemented and we are monitoring the results.

Mar 26, 12:04 PM

investigating

We are continuing to investigate this issue.

Mar 25, 9:35 PM

investigating

The metric writes issue reported in https://status.grafana.com/incidents/gfshj17lxj5z is still ongoing. Our Engineering team is actively investigating this and we will provide further updates as our investigation progresses.

Mar 25, 2:11 PM

Prometheus writes, Logs, and Synthetic Monitoring in prod-eu-west-3 are degraded

minor

Started: Mar 24, 9:08 AM

investigating

This is also now impacting Logs and Synthetic Monitoring in prod-eu-west-3. For Synthetic Monitoring, users might observe errors pushing check execution metrics, and this can eventually lead to missing data. In addition, users might observe errors evaluating Synthetic Monitoring provisioned alert rule evaluations, and this can lead to missed alerts. For Logs, there is no immediate impact on alerts, however, remote writes to Mimir is delayed which means users may see gaps in their recordin...

Mar 25, 7:43 AM

investigating

We are moving this back to 'Investigating' as we are now observing a substantial drop in successful ingestion and increase in write path errors, and elevated rule evaluation latency and error. Reads are mostly fine. Our Engineering team is actively investigating this and we will provide further updates as our investigation progresses.

Mar 25, 7:04 AM

monitoring

We have not observed any recent errors, but we will continue to monitor while we work with our CSP.

Mar 24, 9:23 PM

monitoring

A fix has been implemented and we are monitoring the results.

Mar 24, 9:19 AM

investigating

We are currently experiencing degraded writes for mimir-prod-22 in prod-eu-west-3 since 08:45Z.

Mar 24, 9:08 AM

Grafana Assistant Unavailable in prod-us-east-0

major

Started: Mar 23, 5:03 PM

identified

The issue has been identified, and we are implementing a fix.

Mar 23, 6:25 PM

investigating

The impact extends beyond the TOS check. Assistant is completely unavailable in the impacted region.

Mar 23, 6:07 PM

investigating

We are continuing to investigate this issue.

Mar 23, 6:01 PM

investigating

We are aware of an issue currently impacting Grafana Assistant. Impacted users are met with a request to accept the TOS, however the plugin is failing upon accepting. Our engineering are currently investigating this issue.

Mar 23, 5:03 PM

Authentication API Database Down in prod-eu-west-2 and prod-eu-west-4

major

Started: Mar 20, 3:00 PM

investigating

We have observed impact in prod-eu-west-4 as well.

Mar 20, 3:08 PM

investigating

We are currently investigating an issue impacting the main database for Authentication API's in the prod-eu-west-2 region. Writes are currently failing, but reads are operational.

Mar 20, 3:00 PM

Various Datasource Issues

major

Started: Mar 19, 4:46 PM

monitoring

We are continuing to monitor for any further issues.

Mar 19, 5:56 PM

monitoring

We have observed recovery for the Cloudwatch Datasource. We are now seeing failures for the following Datasources: Aurora Opensearch X-Ray Timestream Redshift Sitewise A fix for the above is being rolled out now, and we will monitor progress. We will also change the name of this incident from "Cloudwatch Datasource Issues" to "Various Datasource Issues" to more accurately reflect impact.

Mar 19, 5:56 PM

monitoring

We have identified the issue, and are rolling out the fix. We are already seeing improvements and will continue to monitor progress.

Mar 19, 5:13 PM

investigating

We are currently investigating an issue impacting the CloudWatch Datasource causing failures.

Mar 19, 4:46 PM

Degraded performance of Grafana Cloud k6 test runs

major

Started: Mar 19, 11:17 AM

investigating

Some customers are seeing degraded performance and errors from certain v6 API endpoints. We are investigating the issue.

Mar 19, 11:17 AM

Grafana Cloud Logs - Write degradation in Azure Netherlands (eu-west-3)

minor

Started: Mar 13, 10:28 AM

investigating

We are continuing to investigate this issue with our CSP, and will provide updates as they become available.

Mar 13, 9:22 PM

investigating

We are seeing issues on the write path for Loki in cluster Azure Netherlands (eu-west-3). Impact will reflect in degradation of logs ingestion on that cluster. Our engineering team is already working on restoring the service.

Mar 13, 10:28 AM

Increased number of Aborted-by-Systems with a k6 binary building errors

major

Started: Mar 13, 7:41 AM

monitoring

A fix has been implemented and we are monitoring the results.

Mar 13, 12:49 PM

identified

The issue has been identified and a fix is being implemented.

Mar 13, 8:45 AM

investigating

We are seeing an increased number of Aborted-by-Systems with a k6 binary building error. We are investigating the issue. The first occurrence of this happened back on March 9, has now been identified as a blocking issue for some customers.

Mar 13, 7:41 AM

Rule Evaluation Outage in prod-us-west-0

major

Started: Mar 11, 5:10 PM

monitoring

A fix has been implemented and we are monitoring the results.

Mar 11, 6:02 PM

investigating

We are currently investigating an issue impacting rule evaluation for a subset of customers in the prod-us-west-0 region. We will provide updates as they become available.

Mar 11, 5:10 PM

Grafana Cloud Logs - Write degradation in Azure Netherlands (eu-west-3)

minor

Started: Mar 11, 8:31 AM

investigating

We are also reporting impact to Faro performance in the same region. We are continuing to investigate this issue.

Mar 11, 9:13 AM

investigating

Mar 11, 8:31 AM

Complete outage in prod-me-central-1

critical

Started: Mar 2, 6:43 AM

investigating

The TLS certificates serving prod-me-central-1 endpoints expire on May 30, 2026. Replacement certificates have been imported, but the ongoing AWS regional incident is preventing them from propagating to all load balancer nodes, so customers may see certificate errors after that date until AWS restores normal operation. We do not have any additional updates to share at this time. Our team is actively monitoring the situation and will provide further information as it becomes available. In th...

May 27, 5:10 PM

investigating

AWS UAE - prod-me-central-1: Public Probe checks might suffer degraded experience. We recommend migrating checks from the UAE probe to the next nearest probe suitable for your use case.

May 21, 11:41 AM

investigating

We do not have any additional updates to share at this time. Our team is actively monitoring the situation and will provide further information as it becomes available. In the meantime, please continue to refer to the AWS Status Page for the most detailed and up-to-date information.

May 13, 9:59 PM

investigating

We are continuing to investigate this issue.

Apr 20, 3:11 PM

investigating

We have not received any further updates from AWS at this time. However, we are actively monitoring the outage and will provide additional information as it becomes available. Also, please continue to refer to the AWS status page for more detailed updates. https://health.aws.amazon.com/health/status All the guidance previously included about stack migration is still relevant. Please reach out to our Support team if you have any questions.

Mar 19, 12:13 PM

investigating

We are actively monitoring the situation, but at this time there are no new updates to share. The next update will be provided once we have more information to share. Please reach out to our Support team if you have any questions.

Mar 4, 10:22 PM

investigating

We are continuing to investigate this issue.

Mar 4, 10:28 AM

investigating

Please continue to refer to the AWS status page for more detailed updates specific to AWS. https://health.aws.amazon.com/health/status AWS are recommending that affected customers move workloads to alternate regions, and we are recommending the same. Customers who are impacted and who cannot wait for a restoration of service are asked to: 1. Create a Grafana Cloud stack in an alternate region 2. Update clients to send telemetry to the new region, if using Grafana Alloy then you can use Fle...

Mar 2, 10:18 PM

investigating

AWS are recommending that affected customers move workloads to alternate regions https://health.aws.amazon.com/health/status and we are recommending the same. Customers who are impacted and who cannot wait for a restoration of service are asked to: 1. Create a Grafana Cloud stack in an alternate region 2. Update clients to send telemetry to the new region, if using Grafana Alloy then you can use Fleet Management https://grafana.com/docs/grafana-cloud/send-data/fleet-management/introduction/...

Mar 2, 10:31 AM

investigating

Customers are recommended to configure a new blank stack in an alternative Grafana Cloud region and to reconfigure their clients (such as Grafana Alloy) to send telemetry to that region, Fleet Management can be used for this purpose https://grafana.com/docs/grafana-cloud/send-data/fleet-management/introduction/

Mar 2, 10:04 AM

investigating

We are updating this incident to reflect a complete outage in prod-me-central-1, due to an on-going AWS UAE data center issue. We will provide further updates accordingly.

Mar 2, 8:36 AM

investigating

We are observing write and read outage errors across all databases (metrics, logs, traces) in prod-me-central-1, due to an on-going AWS UAE data center issue. We will provide further updates accordingly.

Mar 2, 8:21 AM

investigating

Mar 2, 8:14 AM

investigating

We are seeing elevated write and read path errors in prod-me-central-1, due to an on-going AWS UAE data center issue. We will provide further updates accordingly.

Mar 2, 6:43 AM

February 2026

Grafana Cloud Metrics - Intermittent Write Latency in prod-us-central, prod-us-central-5, and prod-eu-west-0

minor

Started: Feb 25, 7:54 PM

monitoring

We are rolling out a mitigation across the environments in these regions, and preemptively where possible to ensure it doesn’t spread elsewhere.

Mar 6, 9:44 PM

monitoring

We have seen an increase in latency in our cloud providers services, and are rolling out a change to mitigate the issue. We are monitoring.

Mar 6, 8:53 PM

monitoring

We are continuing to investigate this issue alongside the CSP, and have taken steps to escalate through the appropriate channels. The mitigation in place continues to work as expected, and any notable updates will continue to be shared here for tracking.

Mar 5, 10:22 PM

monitoring

We are continuing to investigate this issue alongside the CSP. Any notable updates will continue to be shared here for tracking.

Feb 27, 10:05 PM

monitoring

We've implemented mitigation in place and are continuing to monitoring and investigating this issue.

Feb 27, 2:55 PM

investigating

We have begun rolling out mitigation steps to reduce write latency in the prod-us-central-0 and prod-us-central-5 regions. While these measures are expected to improve performance, we are continuing to investigate the underlying root cause of the issue. We will provide additional updates as more information becomes available.

Feb 26, 4:23 PM

investigating

Since February 19, we have been investigating an intermittent issue causing increased write latency in the prod-us-central-0 and prod-us-central-5 regions. The issue does not affect all traffic but may result in delayed write operations for some customers. Our engineering team is actively working to identify the root cause and stabilize performance. We will share additional updates as progress is made.

Feb 25, 7:54 PM

View live status for Grafana Cloud Browse all services