If you’ve ever tried to take a high-energy Husky for a walk on a short leash, you know exactly what Toil feels like. It’s restrictive, exhausting, and keeps you from actually running toward the fun stuff. In the world of Site Reliability Engineering, we often find ourselves bogged down by “the grunge”—the repetitive, manual tasks that keep us from doing real engineering.
This week, we’re digging into the “Eliminating Toil” chapter of the SRE manual. Grab a treat, get comfortable, and let’s learn how to stop chasing our own tails.
Here’s an image to get us started – a happy dog running freely in a field, symbolizing the freedom from toil:

What Exactly is “Toil”? (The “Bath Time” of Engineering)
Just like how every dog knows the difference between a fun “walkie” and the dreaded “bath time,” we need to distinguish between Engineering and Toil.
Toil isn’t just work you dislike. It’s work that has these specific “mutt” traits:
- Manual: Like hand-brushing a shedding Golden Retriever every single day.
- Repetitive: If you’re solving the same problem for the tenth time, you’re just a dog chasing its tail.
- Automatable: If a machine (or an automatic ball launcher) could do it, a human shouldn’t be stuck doing it.
- Tactical & Reactive: It’s “firefighting” (or barking at the mailman). It’s interrupt-driven, not strategy-driven.
- No Enduring Value: Once the task is done, the state of the world hasn’t improved. You’re just back to where you started.
Here’s an image that captures the feeling of “bath time” toil:

The 50% Rule: Keeping the Pack Healthy
At Google, the goal is to keep Toil below 50% of an SRE’s time. Why? Because a dog that spends 100% of its time guarding the house never gets to go to the park to learn new tricks.
The other 50% should be spent on Engineering work—the kind of work that is novel, requires human judgment, and provides permanent improvements. This is how we build “fences” that work so well we don’t have to stay outside barking all night.
Why Too Much Toil is a “Bad Dog”
If we let Toil take over the kennel, things get messy fast:
- Career Stagnation: You can’t make a career out of “grunge.” If you only do manual work, your skills won’t grow.
- Low Morale: Even the best-behaved pup gets grumpy if they never get to play. Too much toil leads to burnout.
- Confusion: It makes people think SREs are just “ops” teams who handle manual tasks, rather than an engineering organization.
- Attrition: Your best engineers—the “Best in Show” types—will leave for a more rewarding job if they’re stuck doing boring, manual work.
Is Toil Always Bad?
Not necessarily! A little bit of toil is like a quick grooming session—it’s unavoidable and can actually be quite calming in small doses. It provides a sense of accomplishment and quick wins. But remember: Toil becomes toxic when experienced in large quantities.
The “Is Your SRE Team Chasing Its Tail?” Toil Indicators Checklist
Every great dog owner knows the signs when their furry friend isn’t thriving. Similarly, every SRE manager needs to recognize the indicators of excessive toil before it leads to burnout or a “ruff” experience for the team.
Use this checklist to identify potential “toil hotspots” in your SRE environment. The more “yes” answers you have, the more likely your team is bogged down by manual, repetitive work that could be automated or eliminated.
Toil Indicator Checklist
Manual & Repetitive Toil (The Endless Fetch Game)
- Is your team performing the same task more than once a week? (e.g., repeatedly restarting a service, manually checking logs for common issues).
- Are SREs spending significant time on “ticket ops” – triaging, assigning, and resolving basic, predictable tickets that don’t require novel problem-solving?
- Do SREs manually copy-paste data between systems or manually generate reports that could be automated?
- When onboarding a new team member, is there a long list of manual steps they must perform to get access or set up their environment?
This image perfectly illustrates the feeling of endless, repetitive tasks:

Reactive & Tactical Toil (The Constant Barking)
- Are SREs frequently interrupted by alerts that are easily resolved with a standard, manual procedure? (i.e., “We just restart it when X happens”).
- Does your team spend a large percentage of their time “firefighting” urgent, unexpected issues rather than planning and executing proactive improvements?
- Are SREs frequently responding to requests for information or actions that could be self-serviced by other teams (e.g., “Can you restart my service?” instead of them having a button to do it)?
- Is your on-call rotation dominated by incidents that don’t lead to long-term solutions or post-mortem action items?
Lack of Enduring Value (Digging Holes for No Reason)
- Do tasks feel like “busy work” that, once completed, don’t fundamentally change the system or improve its long-term reliability/efficiency?
- Are SREs maintaining legacy systems or processes that offer diminishing returns for the effort invested?
- Are there frequent “one-off” requests from other teams that require manual intervention and don’t contribute to scalable solutions?
Operational Overhead (The Never-Ending Paperwork for Your Dog’s License)
- Does your team spend excessive time on administrative tasks, meetings about basic operational issues, or detailed status reports for routine activities?
- Are SREs frequently involved in tasks outside their core SRE responsibilities (e.g., frontline customer support, manual QA, project management for non-SRE projects)?
- Is the documentation for operational procedures constantly out of date, requiring SREs to “figure things out” each time?
Time to Unleash Your Team!
If this checklist made you feel like you need to take your SRE team for a long run in an open field, don’t worry! Recognizing toil is the first step. Now, let’s get those paws moving towards automation and genuine engineering.
The Conclusion: Let’s Invent More and Toil Less
The goal of a great SRE pack is to steadily clean up our services through good engineering. By automating the boring stuff, we shift our collective efforts toward architecting the next generation of services.
Let’s commit to eliminating a bit of toil every week. After all, wouldn’t you rather be building a better dog park than just picking up after everyone else?
Leave a Reply