How To Improve Incident Routing With Cloudaware CMDB and PagerDuty

Challenge

Getting the right people involved during an incident is probably the most important factor in how fast the incident gets resolved. However, as cloud environments sprawl across multiple providers, accounts, subscriptions and as cloud providers add more services, tapping into the right resources can become a challenge.

The events from monitoring tools such as NewRelic, Datadog, etc. that enter PagerDuty, often have little or no structured data that would allow us to fully exploit all of the PagerDuty’s awesome features such as:

  • Filtering
  • Routing
  • Grouping
  • Escalating
  • Response Plays

Solution

Instead of sending events signals into PagerDuty directly, consider passing them through a CMDB, such as Cloudaware. CMDB has a trove of information about the resource involved in the incident from business contacts, mission criticality, components and layers.

PagerDuty uses its own Common Event Format

Cloudaware CMDB takes full advantage of this event format and enriches event data with details specific to AWS, Azure and Google Cloud.

For example if an event above enters PagerDuty, it can now make much better routing decisions based on details such as AWS Account ID and what impacted components is AWS DynamoDB. We can exploit the Layer attribute to decide which on-call schedule should be invoked.

Other benefits of passing events through CMDB, such as Cloudaware, is that we create CI-centric view of the incidents. This is necessary to be able to answer questions, such as, for example - “Which instance or load balancer or EC2 instance has had most issues in the last 30 days?”

CMDB list view showing number of PDuty Incidents per CMDB CI

There is tremendous value in understanding history of incidents for each asset in CMDB. Such feature will help to identify chronic issues, perform root cause analysis and avoid repeating mistakes.

--

--

--

The Most Complete Cloud Management Platform

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Event-Trees and dynamic data structure selection — notes

The Story: Exploring the edges

Using Prometheus Exemplars to jump from metrics to traces in Grafana

Search — Algorithm in JAVA

Real Time Edge Detection in Browser

The edges detected around my cat

How to Automate Your Python Scripts for Free (2021)

How to set up Interceptors for Http calls in Flutter App

What a Top Software Development Company in Poznan Can Teach You About Marketing in IT Outsourcing?

Discover the Marketing of Top Software Development Company

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Cloudaware

Cloudaware

The Most Complete Cloud Management Platform

More from Medium

Top 6 Cloud Cost optimization issues to avoid in 2022

Cloud Computing Top 2022 Trends

How to push .NET 6 app to Cloud Foundry with zero downtime