The Rise of DataFinOps

Controlling cloud data costs has become a pressing—and growing—concern for anyone running a significant amount of their data workloads on modern platforms like Databricks and Snowflake, BigQuery and Dataproc, and Amazon EMR. Uncontrolled costs can jack up your monthly cloud data bill really high, really fast. Getting these unpredictable (and skyrocketing) costs under control has been a persistent challenge for data-forward organizations—with 81% saying that “accurately forecasting and managing cloud spend” remains a top challenge/priority.

Yet most organizations have a hard enough time simply understanding exactly where the money is going—let alone actually bringing their cloud data costs under control. Budget overruns are pretty commonplace. Companies are feeling ambushed by their monthly cloud bills. Forecasting has become a guessing game. Migration projects are stalling out. Some are looking at “repatriation,” going back on-prem.

If you run even 10,000 data jobs a month, there are hundreds of thousands of times when some individual somewhere is incurring cloud data costs—whether they think of it that way or not. Most don’t; they have always been (and continue to be) primarily responsible for making sure their data workloads deliver reliable results on time, every time. With the cloud, all those individual “spending decisions” now carry a price tag. And these costs add up very quickly and remain largely uncontrolled. It’s become a bit like the Wild West, with everybody spinning up instances left and right but nobody able to control what’s going on.

Cloud Data Costs Are Your #1 IT Expense

Most of your IT budget is now or soon will be for cloud costs. Sometime in 2025, according to Gartner, enterprise cloud spending will surpass spending on traditional IT—and never look back. And as cloud spending becomes an ever bigger piece of the overall IT budget, the fastest-growing cloud cost category is data workloads.

If they’re not already, cloud data costs will soon be the #1 expense in an enterprise IT budget.

Data sources: Gartner | IDC

Big data is big bucks. The sheer enormity of today’s petabyte-scale data workloads requires a lot of compute and storage resources, and those resources all cost money. Everything running in the cloud runs on a machine somewhere—typically at AWS or Google or Microsoft Azure. Unlike your on-premises data estate (which also costs, but in an entirely different way), the meter is always running in cloud data environments.

There are 100,000s of different ways and places where you incur cloud data costs every month. Which means that there are 100,000s of opportunities to eliminate overspending and other cost inefficiencies, and make smarter data-driven decisions about variable pricing options, cluster usage/cost, data storage, etc.

Spoiler alert:
You’re probably inadvertently overspending 30-40% (maybe even 50%) of your cloud data budget right now.

DataOps + FinOps = DataFinOps

DataFinOps combines the best practices of DataOps with the financial management principles of FinOps.

Source: Gartner | Source: FinOps.org

DataOps means different things to different people. There’s plenty of excellent thought leadership that delves into what it is and isn’t, how it’s not simply throwing DevOps people, process, and technology at data and getting DataOps. Here, we’re talking more about a mind-set: the Agile, shift-left, self-service, DevOps-like approach of fixing problems before they become problems, with a high degree of automation. It’s vague, but you know it when you see it.

FinOps, as defined by the FinOps Foundation, is ​​”an evolving cloud financial management discipline and cultural practice that enables organizations to get maximum business value by helping engineering, finance, technology and business teams to collaborate on data-driven spending decisions.”

DataFinOps is really just all about getting everybody on the same page when it comes to making good decisions about spending the cloud data budget. A DataFinOps framework fuses together financial discipline and performance engineering discipline and business priorities. It can be thought of as a process that reconciles the sometimes competing financial, engineering, and business considerations to find the most cost-effective way to run data workloads in the cloud.

What DataFinOps Does

There’s virtually unanimous agreement among Finance, Engineering, and Business groups that cloud data costs are running rampant and have to be brought under control. They all know they cannot continue to tackle the problem in silos, myopically looking at just their particular piece of the puzzle. DataFinOps provides a framework that makes it easier for these disparate teams to work together to make smarter, data-driven decisions about incurring cloud data expenses. And then empowering the individuals who are incurring these costs to make better choices about their cloud usage/cost.

DataFinOps has a few underlying “north star” principles that mirror those of the FinOps Foundation.

  • Cloud cost governance is a team sport. Too often controlling costs devolves into Us vs. Them friction, frustration, and finger-pointing. When everyone works with their own set of data (and their own tools), you end up with an information mismatch. Teams need to collaborate, pulling in the same direction to the same destination.
  • Spending decisions are driven by business value. Not all cloud usage is created equal. Not everything merits the same priority or same level of investment/expenditure. Value to the business is the guiding criterion for making collaborative and intelligent decisions about trade-offs between performance, quality, and cost.
  • Everyone takes ownership of their cloud usage. Holding individuals accountable for their own cloud usage—and costs—essentially shifts budget responsibility left, onto the folks who actually incur the expenses. This is crucial to controlling cloud costs at scale, but to do so, you absolutely must empower engineers (including and maybe even especially Operations teams) with the self-service optimization capabilities to “do the right thing” themselves quickly and easily.
  • Reports are accessible and timely. To make data-driven decisions, you need accurate, real-time data. The various players collaborating on these decisions all bring their own wants and needs to the table, and everybody must be working from the same information, seeing the issue(s) the same way—a single pane of glass for a single source of truth. Dashboards and reports must be visible, understandable, and practical for a wide range of people making these day-to-day cost-optimization spending decisions.

Bear in mind that DataFinOps is not just about reducing costs—though you will save probably millions from just eliminating waste alone. We’ve seen data-intensive enterprises lower cloud data costs by 30-40% (sometimes 50%) by taking a DataFinOps cost-governance approach.  DataFinOps is mostly about making sure you get the most value out of your modern data stack investments at the most cost-effective price. Essentially, making sure you get the most bang for your buck.

DataFinOps stitches together and correlates performance data (usually captured by data observability) with financial data (from the cloud vendors) to show both how much you’re spending (and where, who, when, why) overall as well as a provide a quantified “per unit cost” view of everything you have running in your cloud data estate.