Cost optimization in AWS - Part II

Dimiter Shalvardjiev

Cost optimization in AWS - Part II

Having identified your team's strengths and weaknesses, understood your data streams, and analyzed your AWS bill as discussed in Part 1, it's time to plan and execute the changes. Drawing from my experience with S3-based data lakes, Redshift data warehouses, Glue, Lambda, and SQS for intermediate processing, I offer the following suggestions.

Remember, the goal is to reduce, in order of priority, I/O, processing, memory footprint, and storage. While storage is usually the least expensive item, it can still accumulate over time.

Eliminate Unnecessary Duplication of Work and Data

The aim here is to avoid redundant effort.

Begin by ensuring you're performing incremental loading for all data sources. While some sources may not offer Change Data Capture (CDC) capabilities, you might devise innovative ways to pull only mutated data.

One method involves generating a hash of all values per ID at both the source and destination, comparing hash values, and copying only the mutated values.

Consider performing frequent incremental loads only for frequently used data. Then, implement an "eventual consistency" or "healing" method that performs full loads less frequently, such as weekly or monthly.

Next, assess how many copies of the data you keep and how many you actually use. I typically follow a pattern of keeping one full verbatim copy, one full transformed copy (including data type transformations), and other mutations like data marts. Anything outside these categories is expendable and should be minimized.

Evaluate Your Logging Practices

Over-retaining logs is another potential pitfall. Most AWS services have a configurable log destination. Use this to monitor what's being logged and set up lifecycle rules. If you need to retain logs for years due to compliance requirements, S3's intelligent tiering is a feature I highly recommend.

Lastly, be aware that some services, like Glue, use S3 as swap storage. If you're copying large amounts of data, you might also be keeping a copy among your logs. While you can't override this behavior, consider reducing the data loads instead.

Leverage Capacity Reservations and Lower Execution Tiers

Data teams often work on long-term tasks, so capacity reservations make sense for long-term cost control. You pre-purchase what you're sure to be using a year from now. For example, with Redshift, you're unlikely to stop using it or reduce the number of nodes within the next year. So, reserving that capacity can reduce your bill by 25% in less than 10 minutes.

Another tip: Glue offers two execution tiers - standard and flex. While the official guideline suggests using flex only for development workloads, I've successfully used it in production without issues. Using job retries in combination with flex execution has proven reliable in my experience.

Remember, AWS cost optimization is about making informed decisions that align with your business objectives and budget constraints. By planning and executing these changes, you're well on your way to mastering AWS cost optimization.