How To Cut Cloud Costs By Optimizing Logging: An Interview With Oleksandr Shevchenko

As companies scale, infrastructure grows exponentially – and so do logs. Yet, not all of that data is meaningful, and storing it can become a major cost driver. Oleksandr Shevchenko, a Site Reliability Engineer at one of Europe’s largest banks, shares how to reduce cloud spend through smarter logging and infrastructure practices, and what tools help make it happen.

Oleksandr, as a Site Reliability Engineer at a major European bank, what’s your starting point when trying to reduce cloud costs without compromising performance?

The first step is analyzing the billing reports to identify which components are driving costs – often, it’s things like log storage, underutilized VMs, or oversized clusters.

When we see exponential growth, we investigate the root causes and coordinate with other teams to determine whether those features are truly necessary. For instance, if a particular logging process isn’t mission-critical, then we reduce the log retention or disable it altogether.

Another tactic is resource scaling. We freeze or scale down infrastructure during low-demand periods – say, over weekends – then scale it back up on Monday. It’s a simple change, but it adds up over time.

We also continually reassess which processes are actually needed. In production, we might prioritize resilience and allocate more resources, but in development environments, we can use cheaper instances. That saves money without impacting performance where it really matters.

Can you share a case where properly configured autoscaling helped your team save money? And how do you decide between horizontal and vertical scaling?

We routinely review the resource needs of our dev and staging environments. For example, we realized we didn’t need to keep multiple pods running 24/7 – just one or two were enough. We configured the app to run only on weekdays, which significantly reduced resource usage.

Then we analyzed CPU and memory utilization. Where we found bottlenecks, we scaled up. Where utilization was under 20%, we scaled down without affecting resilience.

As for choosing between horizontal and vertical scaling, it depends on the app. Horizontal scaling is ideal for microservices; vertical scaling works better for monolithic applications.

When it comes to reducing cloud costs through log optimization, what tools and strategies do you use – especially in AWS and GCP?

It varies by project. Lately, I’ve been using Cloud Logging and Log Router in GCP. Logs are routed to Cloud Storage for analysis. To cut down on noise, I configured filters in the Log Router to exclude non-essential logs before they’re stored.

I ran a detailed audit, created a report, and worked with teams to determine which logs were critical. After implementing the filters, we reduced log volume by about 20x, which translated into significant storage cost savings.

How do you decide between managed services (like AWS RDS or Azure App Service) and more flexible but resource-intensive setups like Kubernetes on EC2?

It depends on the project’s goals and priorities. If speed and ease of deployment are critical, managed services are the better option. Services like AWS RDS or Azure App Service work out of the box – you don’t need to worry about setup, patches, or basic infrastructure monitoring. This is especially practical if you’re committed to a single cloud provider.

But if resilience, flexibility, and cloud-provider independence are priorities, it makes sense to go with a more customizable solution. Deploying on VMs or Kubernetes gives you full control and makes it easier to migrate between clouds – or even move on-prem – if costs spike or security requirements change.

In banking, where I currently work, security and fault tolerance are non-negotiable. So we often use hybrid setups – part cloud, part on-prem. It’s more complex, but it ensures service continuity under any conditions. In other industries, where requirements are lighter, we can comfortably rely on managed services with region-level redundancy.

Ultimately, it’s all about balancing convenience, control, and risk. If you need to scale fast and flexibility isn’t a priority, go with managed services. If you need independence and stability, then roll up your sleeves and build it yourself.

Tell us about your experience with Terraform and Terragrunt for infrastructure automation. How have these tools impacted your cost efficiency and development speed?

Terraform and Terragrunt are incredibly powerful for managing cloud resources efficiently. With them, we can spin up infrastructure, create snapshots, or decommission unused resources in minutes. Everything is described in code, which boosts speed and makes deployments predictable.

Terraform helped us automate infrastructure rollout, which made scaling and launching new services much easier. Terragrunt, on the other hand, helped organize the codebase, reduce duplication, and manage multiple environments more effectively – especially when handling large, modular infrastructures.

We also gained better visibility: every Terraform plan shows exactly what will be created, changed, or destroyed. This reduces human error and gives us clear control over who’s doing what. All changes go through pull requests and code review – no manual edits in the cloud console. That improves both security and stability.

On top of that, we implemented cost-optimized configurations – like switching instance types or scaling down on weekends. That gave us greater flexibility and led to noticeable cost reductions.

Have you ever proposed an unpopular cost-saving measure? How did you persuade your team or leadership to go along with it?

Yes, and it usually comes down to clear communication. I’d start by preparing a short demo or presentation for the dev team – outlining why the change is needed, how it affects their workflow, and the potential benefits. Then we’d openly discuss risks and decide whether to move forward after testing.

One case was disabling a monitoring system that duplicated metrics from another source. Initially, the team resisted. So I ran a test phase with the system turned off, closely monitored the results, and—no one noticed the difference. That helped build trust.

Another case involved excessive logging. We were getting floods of non-critical logs from every pod. I analyzed a month’s worth of logs, identified the noisiest services, and worked with the team to decide what was actually useful.

Then I added filters to the Log Router and set up alerts only for key events. We cut log volume by nearly 20x. Not only did that slash storage and processing costs – it also made logs easier to work with for developers.

What experience most shaped your understanding of how engineering practices can directly impact a company’s bottom line?

Early in my career, I worked with large cloud providers and data centers. That shaped my mindset around resource usage. From the start, I was focused on figuring out what truly needed monitoring and what could be turned off to avoid waste.

I learned how to optimize resources at every level – from compressing non-essential data to configuring autoscaling to avoid over-provisioning. Over time, I realized that managed services like Amazon ECS or Google Cloud Run can be game-changers – they eliminate the need to manage infrastructure, updates, or high availability. It’s all built-in.

That experience helped me develop a mindset not just about control, but about constantly looking for ways to eliminate redundancy, automate smartly, and reduce costs – without sacrificing reliability or developer experience. That, to me, is the core of great engineering: when technical decisions directly support the financial health of the business.

Also Read:

How to Cut Cloud Costs by Optimizing Logging: An Interview with Oleksandr Shevchenko