Does the “Root Cause” Even Matter?

I’ve been working on data processing and software for most of my career. One of my best skills in my job is finding the root cause of client-reported data errors in our products. This means that even after a ticket has been created by client support and bounced around to multiple people in my department who will look at our systems and our inputs and be unable to see why input A is being translated to output B, the ticket typically makes it to my desk and in relatively short order, I can find where the error occurred and document the precise steps to fix it. If this task could make up 100% of my day job, I would have a great deal more satisfaction with my position than I do now.

Generally, I am able to do this by having spent many years learning and working with our systems and figuring out how they work. In my spare time, I test things, look for anomalies or changes in outputs over time and teach myself. In short, I’m curious about how things work. I have tried multiple times to document my procedure for how I do what I do, but without my knowledge and curiosity, nobody else would really be able to replicate my process. There are just things I know to be true, because they are true, but I cannot prove them to you if you don’t trust or believe me. It is just how I look at systems, how information moves through them, the series of if-/then- transitions that I know from our history, decisions that were made at different points in time and why they were made in that way, etc. All of this exists in my head and informs how I do my job.

Which is why I have been known to get a little bent out of shape on occasion when folks without this historic knowledge (either because they haven’t been with the company that long or because they have forgotten) make decisions that are seemingly based on criteria that feel foreign or historically out of context. These people aren’t bad and they aren’t ill-intentioned, but they don’t necessarily have all the information at the ready to come to the conclusion that will satisfy all requirements. They haven’t necessarily asked all the right people the right questions. If we collectively decide to scrap ALL of our old rules and start again from scratch, that’s fine, but we definitely should not do so without knowing that’s what we are, in fact, doing.

I see systems on earth and in society much the same way. The government is a series of systems at different levels as is the economy. Each social media product is a system and the internet as a whole is a system. The universe is a system containing our solar system. And even within our own selves, we have a circulatory, respiratory, vascular, skeletal and endocrine systems, and many others. Things are very complex out there.

But specifically, I want to talk about what happens when there are bugs in the system, any system. I’m not talking about insects, which are, in fact, vital to our earth’s ecosystem (there’s another system). I’m talking about bugs in the software sense… things that go wrong in the code. Most of the time, when bugs are identified, you want to find the root cause, the action or piece of code that caused the error downstream in the output. Then you can fix it, wait for the process to refresh and the bug will have been corrected. Easy peasy.

But what if the system is too complex and there are possibly multiple causes, possibly no “root” to bring about a simple fix? A real life example would be our slow-moving climate disaster we are stumbling blindly(?) into. Human psychology, societal structures and economic priorities have shown there are layers upon layers of complexity to this issue, leading to our complete and total failure to resolve it back when it was first considered to be a problem over a half-century ago. Even as root causes were identified by scientists, others were being paid to say the opposite, casting doubt, creating plausible deniability for those who had the political and/or economic power to have done anything to set us on a more sustainable path of progress. Today, even if we were to be able to make those decisions to fix the original bugs in the system, it would not be enough; those actions would no longer have much of an impact at all, given the time that has passed and the metasticization of the problems. We are dying from our own inability to act in rational ways, in a reasonable time frame. This is also, not coincidentally, precisely what happened with the pandemic.

So what to do now? I propose a complete reassessment on the usefulness of root cause analysis, as a discipline. Yes, I realize this will potentially destroy much of the joy I still get from my job, but that’s small potatoes and this definitely is NOT about me. I’m challenging the usefulness of the skills that I myself have. If you can’t get people in a society to act to avoid an impending crisis until the full-blown crisis is in their faces, what help is it to have spent any time at all identifying what needed to be done in the past, if only we could have pulled our collective shit together? It’s just not working.

What should we do instead, you may ask? Prioritize the general welfare of the system and make every decision based on whether it will increase or decrease the complexity of that system. This is a universal suggestion, by the way. If we want to be able to solve problems in the future that haven’t even happened yet, the likelihood we will be able to do so with the complexity of the systems we have now is next to none. If we continue to build layers upon layers of bullshit on top of the bullshit we already need to wade through for no good purpose, or for the sole purpose of enriching small numbers of people at the expense of others, we will absolutely ensure our own demise. We must simplify. Simplify our lives, families, cultures, societies, governments, economies, etc. All man-made systems need to be harmonized so that we can act to solve problems more efficiently in the future.

For my job, this proposal SHOULD be easy (yeah, right). I just have to find people in key positions who understand what I’m saying and start making decisions going forward with the system itself as its own stakeholder. Policies cannot be changed without considering whether the outcome will result in more or less complexity to the system. But also to the contrary, every opportunity for change that will lower the level of complexity should be strongly considered, if not adopted outright immediately. Over time, doing these things will make the system itself more stable, more sustainable, easier to work with, more efficient, more accurate, more responsive and will CONTINUOUSLY LOWER stress levels of the employees, as opposed to what we have now which is a one-way rachet seemingly attempting to stretch and break every last one of us. I can see all of this happening so clearly because I AM the canary in the coal mine. I am the “weakest link” in some ways. I may be able to find a root cause in my sleep, but I would rather not play Whack-a-Mole until I want to die. There’s a better way to get from here to there. We just have to decide we want to make decisions that way.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s