SRE & DevOps Incident Response
Auto Response to performance degradation of applications deployed on Azure
Site Reliability Engineers are facing continuous pressure to keep their cloud services up and running. Incidents can occur at any time and should be solved quickly. Automate root cause analysis, false positive elimination and incident responses in few clicks and keep your SRE team focused on relevant and complex issues.
Incident response with human-in-the-loop
It is 2AM in the morning. Azure Monitors finds increasing access errors for one of the critical applications. The SRE team is notified, a member of the team has to get out of bed and start the troubleshooting and remediation process!
Why not autmate it?
With Ubility, SREs can deploy an automation flow that connects to Azure monitor, listens to notifications and start troubleshooting automatically and on behave of the SRE team. They will only be involved when a critical issue is detected. All of this without writing any line of code!
SRE configures Incident response flow using Ubility flow designer
Azure Monitor triggers alarm when a certain threshold of 4xx\5xx errors are reached
Azure Monitor automatically checks the CPU and memory usage of the application in question
If CPU\Memory is high the SRE engineer is notified through Slack.
The notification contains all relevant details of the CPU\Memory and asks the SRE engineer to upgrade Kubernetes cluster.
SRE clicks the upgrade button in the Slack message.
The flow automatically upgrades AKS and logs a ticket in Jira with all the details.