Optimizing VPC Flow Logs - Part 2
Overview
As cloud deployments scale, Amazon Web Services (AWS) VPC flow logs become an invaluable network visibility and security tool. They are also one of the most voluminous classes of data, making them an expensive choice to add to analytics platforms. With growing infrastructure and traffic, managing these logs presents significant challenges.
In part 1 of this series, we took a look at common use cases and problems associated with storing and processing VPC Flow Logs. In this blog post, we'll delve into how Observo.ai addresses these challenges by optimizing VPC Flow Logs for cost efficiency and enhanced performance, all while preserving their analytical value. Stay tuned for part 3 of this series, where we'll examine a customer case study.
Challenges working with VPC Flow Logs
The sheer volume of data generated can overwhelm traditional log management solutions, leading to:
1. Storage and cost concerns
- Extensive flow logs translate to massive storage requirements and potentially hefty CloudWatch Logs charges.
- Optimizing retention policies and employing data compression techniques become crucial.
- Even when sent to different log aggregation systems the size quickly becomes the bottleneck
2. Performance bottlenecks
- Ingesting and querying massive datasets can strain CloudWatch Logs infrastructure, impacting analysis performance and real-time visibility.
3. Log sprawl and complexity
- Numerous flow logs across diverse VPCs create a fragmented landscape, hindering centralized monitoring and troubleshooting.
- Streamlining log organization and adopting standardized naming conventions become essential.
4. Verbosity
- Default VPC flow logs are highly detailed, and capture extensive information on network flows. A default VPC flow log for format v5 has 29 fields.
Commonly Used Approaches
Faced with the above challenges companies usually employ the following group of strategies to combat and make sense of VPC flow data.
1. Adhoc Pipelines/Homegrown Tooling
- Many companies resort to creating ad-hoc pipelines and developing homegrown tools tailored to their specific needs.
- While this provides a customized solution, the downside is the lack of versatility. Such tooling is often intricate and narrowly focused, making it challenging for other organizations with similar requirements to leverage without significant additional investment.
- Tools like Spark and others are usually used for these purposes which are often on the heavy side of compute and infra requirements.
2. External Systems with Added Costs and Overhead
- Another common approach involves integrating external systems, often OSS solutions, to handle the voluminous VPC flow logs.
- However, this strategy introduces increased costs and additional technical and management overhead as now more and more dev time is spent in managing and KTLO of these systems.
- Such systems may require substantial configuration and ongoing maintenance, further complicating the operational landscape.
3. Cold Storage
- A prevalent but less optimal strategy is the "store and forget" strategy. In this scenario, organizations archive logs without actively deriving real-time insights. Instead, they run batch workloads periodically to decipher the essence of these logs.
- This approach is marked by a lack of immediacy in response and a reliance on periodic, resource-intensive analyses rather than continuous monitoring.
Using Observo.ai to Dynamically Optimize VPC Flow Logs
In response to the complexities associated with VPC flow logs, Observo.ai offers a comprehensive solution that not only addresses existing challenges but also introduces innovative features for enhanced operational efficiency and real-time insights.
1. Aggregations
- Smart Summarizations with Machine Learning (ML):
a. Observo.ai introduces zero-click smart summarizations powered by ML, providing efficiency without user intervention. Importantly, this reduction is achieved while retaining the integrity of critical information, ensuring no loss of valuable data. This summarization results in more than 80% log reduction.
b. Smart Summarization works by identifying network flows within VPC flow logs and aggregating them. A network flow can be defined as a tuple of
- srcaddr
- srcport
- dstaddr
- dstport
- protocol
c. After these network flows are identified they are then aggregated based on those fields.
- Advanced Log Aggregation:
- Compared to Smart Summarization this approach lets the users define the network flow identification semantics. As the user is completely in control of this, they can incorporate their domain and infrastructure layouts to perform more aggressive aggregations.
2. Realtime Processing Pipelines
- Real-Time Data Visibility:
a. Observo.ai provides a dynamic and real-time view of your data, enabling swift and informed decision-making. This feature ensures that your security team has up-to-the-minute information to respond promptly to potential incidents.
Some screenshots from our Flow Log Pipelines
- Reduced Mean Time to Detect (MTTD):
a. By processing the data in real-time Observo.ai helps teams with quickly being alerted of any ongoing security incident within their infrastructure.
3. Volume Reduction and Enrichment Capabilities
- Customized Pipelines:
a. Observo.ai offers customizable field-dropping capabilities, as seen in the VPC flow log definitions, by default, 29 fields are emitted for a single flow log and all that information may or may not be what a team or org desires.
b. There might be additional information that we want to tag our VPC flow logs with like internal orgs or team names etc. With Observo this is pretty straightforward to achieve.
c. You can tailor your VPC flow log handling with Observo.ai's flexible pipelines. Whether it's based on subnet ID, CIDR, or other criteria, you have the power to enhance or drop flows according to your specific requirements. For example, we can have rules in Observo.ai that can drop all flow logs that are originating from interactions between private subnets and so on.
4. Developer Productivity
- Most of the pipelines and primitives are easy to use and usually require no user interaction on the user's part to start optimization.
- Our platform is constantly learning and doesn’t require any manual intervention or tweaking from the operator's perspective thus saving the operator from constantly observing and tweaking these pipelines.
- Due to the reduction in data volumes queries run faster and are more real-time thus unlocking more and more complex analysis on these flow logs. In addition, these also lead to cost and time savings.
Conclusion
VPC Flow Logs provide valuable insights into network traffic within your VPC. By creating and analyzing flow logs, you can enhance your network security, monitor traffic patterns, and better understand your VPC's behavior. Understanding the capabilities and limitations of flow logs is crucial for effective network management and troubleshooting. Observo.ai offers an AI-powered Observability Pipeline which can aggregate log data with our Smart Summarizer feature, provide real-time visibility into your data, eliminate portions of log that provide no analytical value, and increase engineer productivity by reducing the noise and speeding queries.
Keep an eye out for part 3 of this series, where we explore a customer use case, providing practical insights into an Observo.ai in a real-world scenario.
Learn More
For more information on how you can save 50% or more on your security and observability costs while resolving incidents more than 40% faster with the AI-powered observability pipeline, Read the Observo.ai White paper, “Elevating Observability with AI.”