Displaying traffic and metrics data in Web Console partially degraded
Incident Report for Split
Postmortem

Subject: Split Service Advisory - Results / Metrics UI Failure - 2018-02-28

Dear Customer,

This is a Split Service Advisory (SSA) regarding an incident resulting in Traffic and Metrics data in the Split web console that was inaccessible under some circumstances from February 27th 20:19 PST to February 28th 08:59 PST 2018. Split provides mission-critical services and we treat any action that can cause a service degradation with utmost sensitivity and priority. It is our goal in this SSA to explain our understanding of the cause of the disruption and to describe the corrective actions that we will be taking.

Incident Background

At 20:19 PST on February 27th, 2018, the Metrics and Results section in Split web console was unreachable when changing environments. The outage lasted until February 28, 08:52 PST, at which point services were restored.

Event Timeline and Detailed Customer Impact

  • 2018-02-27 20:19 PST - Incident reported by one of our customers.
  • 2018-02-27 21:34 PST - Split engineers began investigating the issue.
  • 2018-02-27 21:52 PST - Engineering team was able to reproduce.
    • Impact: Metrics and Traffic sections in some Splits were returning empty data when changing from one environment to another.
  • 2018-02-27 22:50 PST - https://status.split.io/ updated to investigating.
  • 2018-02-28 05:00 PST - Engineering team started working on the fix.
  • 2018-02-28 07:00 PST - Fixes were applied in our development environment web console.
  • 2018-02-28 07:11 PST - https://status.split.io/ updated to identified.
  • 2018-02-28 08:00 PST - Fixes were tested in our web console.
  • 2018-02-28 08:52 PST - Split engineers deployed production fixes.
  • 2018-02-28 08:59 PST - Split engineers validating fixes on production and resolving incident.
    • Impact: Metrics and Traffic sections for Splits are returning data for different environments.
  • 2018-02-28 09:16 PST - https://status.split.io/ updated to monitoring.

Technical Cause

Split’s web console calculates a couple of page loading time metrics on the fly for the Metrics and Results section, using a custom utility. When changing environments, the utility was incorrectly initialized, and when the section loading time metric had to be calculated, the utility failed since it was not initialized. This caused an execution error that blocked the javascript thread. As a consequence, when navigating from one environment to another, the chart, impressions and metric cards did not load properly, showing there was no data.

The issue was introduced when the routing library we use in the UI was updated to the latest version and deployed to our production environment. Whenever the environments change, the code to initialize back the utility was incorrectly placed in the routing flow and didn’t happen in the expected order.

Remediation

The Split Team pushed a fix that allows the utility to be initialized correctly when changing environments, so the console error is avoided and the javascript thread continues to work normally.

Our automation harnesses on our testing environments have been updated to consider these type of scenarios. These new suites will prevent this type of incident from reaching our customers again.

Conclusion

We would like to apologize to our customers for any impact you may have experienced as a result of this event. We value the trust you've placed in us, and will endeavor to improve on our processes, procedures, and systems.

For further questions, please contact support@split.io.

Posted Feb 28, 2018 - 17:23 PST

Resolved
This incident has been resolved.
Posted Feb 28, 2018 - 11:00 PST
Monitoring
Our team has pushed a fix to address this issue and is monitoring the Web Console closely for any continued issues displaying traffic and metrics data.
Posted Feb 28, 2018 - 09:16 PST
Identified
Users may be experiencing intermittent issues with traffic and metrics data loading properly. Our team has identified the issue and will be pushing a change shortly.
Posted Feb 28, 2018 - 07:11 PST
Investigating
Displaying traffic data and metrics results within Split's Web Console is partially degraded and displaying results intermittently. We are investigating.
Posted Feb 27, 2018 - 22:50 PST