Implement Usage / Cost Guardrails

First, define what your cost governance policies are. How much is too much. Then set up automated guardrails to enforce them.

The MegaBankCorp DataFinOps team wanted to control costs across a couple of different dimensions. Projected budget overrun. Jobs that exceed a certain size, duration, or cost. Collaboratively, Sammi and Raj and Nan set some boundaries, or thresholds, for usage and cost by DBU (for Databricks jobs), how long the job is taking to run, the size of the job, amount of resources being consumed, and dozens of other fine-grained metrics that impact how much something would cost. For example, if a job is running for more than X hours, or a cluster is costing more than $X.

These thresholds are the “trigger condition” settings that, when violated, initiate some kind of action—either an automated alert or an autonomous, preemptive corrective action.

MegaBankCorp has built an ever-expanding library of such guardrails with a high degree of precision and flexibility. They set up guardrails in a template that looks something like this:

Image courtesy of Unravel