Out of Control: Managing log data costs in an economic downturn
How we got here
Log management costs are growing, and it's a concern for companies, users, and developers trying to scale their organizations in today’s macro environment. Companies are making investments in systems that collect data from the cloud, applications, and infrastructure in order to monitor their performance and security. The amount of machine data generated every day is skyrocketing as businesses digitize and automate operations. From logs and metrics in cloud environments to sensor data in IoT devices, the ever-increasing volume and variety of machine data are overwhelming traditional tools. The result isIncreasing data storage costs, decreasing productivity, and increasing complexity in analyzing and managing data.
The stakes are high - the cost of losing control over your observability data can be devastating. Application downtime can result in the loss of millions of dollars and customer trust. A breach of personally identifiable information (PII) could mean lawsuits or fines from regulators like the Federal Trade Commission (FTC). Companies invest heavily in monitoring systems across their IT infrastructure landscape with an eye toward better security and performance—and log cost escalation is becoming an increasing concern within organizations. Given the macroeconomic situation, most Engineering and Security groups are under significant pressure to reduce costs without sacrificing customer delight and security/compliance posture.
The dark side of exponential growth
Some estimates suggest that the amount of log volume generated by businesses is increasing by 35% year over year. We see this quite often working closely with enterprises - some organizations are forecasting their log data will double in the next 2 years. Historically, log data APIs have been some of the easiest to use, and as such developers tend to write plenty of logs. This increase in data can also be attributed to the rapid growth in Public Cloud usage. Most enterprises are investing heavily in native Cloud application development, and the lift & shift of existing systems from on-premises to the Cloud is happening at unprecedented rates as well. Cloud infrastructure and application logs, especially Flow Logs, Kubernetes logs, CloudTrail logs, Firewall logs, Syslog et al are growing very rapidly, in some cases at the rate of 5x year over year as these logs tend to be very chatty. This means that your team has to deal with an ever-increasing amount of data, and this can be a problem for your company as a whole. The problem we are facing is that the logs have devolved into noise, meaning that they fail to provide quick information about failures and successes. There’s a lot of repetition in the data and most of that repetition isn’t useful at all. Some log data such as CDN logs can have up to 95% noise with specific metadata and messages repeating thousands of times. All this repetition doesn’t really contribute to analytics or insights!
With the exponential growth in the data, log management costs are growing rapidly as well. Your DevOps & SecOps team’s cost of managing logs and observability tools is growing as more data is created. This is because most observability/log management tools and SIEMs are priced based on how much data is ingested and stored. Enterprises we have spoken with typically see log management and observability costs grow anywhere from 30-100% year over year. An average mid-market organization spends $5-10M/year+ on observability, and larger enterprises typically spend $10-$15M+ every year. Since most of this data is noise, enterprises are throwing away money as the value isn’t proportional to the investment.
Light at the end of the tunnel
Now, the good news is - noise in log data can be reduced drastically by applying a variety of optimization techniques specifically designed for observability data resulting in better log cost management. This reduction can result in valuable log data effectively being separated from all the invaluable data. Noise can be reduced by smart classification and filtering techniques by identifying recurring themes in the log data and deducing what’s of value and what’s not. Also, log data varies greatly by its source. For instance, the log data generated by Flow Logs are different from application log data. As such, it’s not one size fits all, and multiple techniques need to be applied. A technique that works on a certain log data source might not be very effective on another log data source.
As an example, AWS Cloud Trail data is notably repetitive, with metadata repeating in each and every line contributing significantly to the noise. Much of this noise provides no value to the SRE or Security teams. About 40-50% of the Cloud Trail data can be summarized by consolidating the metadata and turning a few hundred lines into just a dozen of lines, without the loss of any valuable insights in the data. Another example shows up with application logs, where filtering based on tags is key, as the data has likely been appended by collectors/agents such as fluentd, Splunk heavy forwarder, etc. Furthermore, ML-based techniques can be used to further defeat noise. Log data and its context can be used for training models that can detect data that doesn’t really add any value. This is very useful, especially with Kubernetes and other application data that have logs pertaining to transactions such as customer orders and payments in an E-commerce system.
A toolbox of such techniques can be very handy in classifying log data into what’s valuable and what’s not. In the work we have done with enterprises, this new reduction typically averages anywhere from 50-80%, with proportional cost savings. The valuable data can be stored in your observability platform or SIEM such as Splunk, Datadog, Elastic, Sumologic, NewRelic, etc. The lower log volume in your observability platform means you pay less for storing log data. You are paying for storing data that has value and a very high signal-to-noise ratio resulting in superior analytics and insights, leading to better performance and results. The noisy, invaluable data can be stored in cheap blob storage or data lakes so it’s always available for digging deeper, and for compliance purposes. In most cases, the cost of storing log data in blob storage can be 1/100th of the cost of storing this data in Splunk, Elasticsearch, or Datadog. We have been able to realize significant cost savings to the tune of 30-50% for most observability architectures and environments! Not only have you saved significant costs, but you have also made your observability stack better and more efficient.
Conclusion
Log data is growing. Log management costs are growing. Observability tools are growing. Log cost management is needed more than ever, as more data is created and your business spends money without extracting proportional value. However, with the right tools, you can significantly de-noise this super noisy dataset and save high observability costs while making your observability better with reduced incident detection and response times. This results in better security & higher customer satisfaction and top-line growth.
Learn More
For more information on how you can save 50% or more on your SIEM and observability costs with the AI-powered Observability Pipeline, Read the Observo.ai White paper, Elevating Observability with AI.