Waste in software development

The Toyota Production System Link to heading

It was around ~2011 when I first read The Toyota Way book. This book introduced me to The Toyota Way principles and the Toyota Production System (TPS). It laid the foundation for me to understand the Continuous Delivery book which I later read.

The Toyota Way also lay the groundwork for me to understand the inter-related Lean Manufacturing, particularly Lean services (which applies the Lean concept to the service industry). However, since there has been a lot of confusion about how various Lean concepts interrelate with TPS, I will just stick to TPS for the sake of the rest of this article.

Waste in TPS Link to heading

Taiichi Ohno, the father of TPS, introduced the concept of “muda” at Toyota. Muda in Japanese means “waste”. Wasteful tasks don’t add any immediate value to the customers. There are eight types of waste within the Toyota Production System:

Waste of overproduction (largest waste). Production ahead of demand.
Waste of time on hand (waiting). Waiting for the next production stage.
Waste of transportation. Moving products that are not required to perform the processing.
Waste of processing itself. Resulting from poor tool or product design-creating activity.
Waste of excess inventory. All components, work-in-process, and finished products are not being processed.
Waste of movement. People or equipment moving or walking more than is required to perform the processing.
Waste of making defective products. The effort in inspecting for and fixing defects.
Waste of underutilized workers. Underutilizing people’s talents, skills, and knowledge.

Software development waste Link to heading

Software development is not like making cars (as opposed to Toyota). For example, while car manufacturing deals with the transportation of physical items (waste no 3 above), software development rarely does. But that does not mean that waste cannot be found in software development. If we squint our eyes a little, using Toyota Production System’s definition of waste can be a surprisingly fruitful analogy to understand where waste happens when making software. Let’s go through each type of waste and see what this means for software:

Waste of overproduction. This includes usually “producing” too many features/changes until we deploy to production. In other words, batching up too many changes into a deployment. This leads to slower feedback cycles and usually higher defects.

Reducing the DORA metric “Lead Time for Changes” reduces this type of overproduction.

Waste of time on hand (waiting). This includes things like waiting for someone else to review your code. A solution can be pair or mob programming as it usually reduces lead time because review happens by someone else in parallel while typing out the code.

Just extracted some stats from code reviews at @tink using https://t.co/12cWRJp3o6. (The data for the rightmost bar consists of two samples, so let's ignore that one for now :-P) pic.twitter.com/TLYncFZDfU
— Jens Rantil (@JensRantil) October 10, 2019

If you need additional review steps to deploy to production (code review to merge into a “production” branch), it also adds to this type of waste. If you’ve ever worked with this type of workflow, I’m sure you’ve come across hearing something like

“I think maybe Noah modified something so we need to check with her before deploying.”

Another related waste could happen if you are working with multiple source code repositories and must wait on code review in one repository before someone can review/merge code in another repository.

Waiting time also involves compilation times and times to run tests locally on a development machine.

Not to mention waiting on Jira’s user interface to load…

Atlassian claims they have 180.000 #Jira customers (https://t.co/6O9Md54uVX.). Assuming an average of 25 users per customer, each user make 30 clicks in Jira every day and Jira's incredibly slow UI takes 5 seconds per operation, that's more than 21 man-years wasted per day.
— Jens Rantil (@JensRantil) April 8, 2021

Waste of transportation. I usually think of “code refactoring” here. Ohno defined two types of waste:

Muda Type I: non-value-adding, but necessary for end-customers.
Muda Type II: non-value-adding and unnecessary for end-customers.

Some refactorings are needed to build a new feature (type I), while other refactorings are not needed at all (type II). Type II refactorings are usually when engineers struggle to describe why a change is needed from the customer’s perspective. This is why I think encouraging sentences like “Refactoring to be able to…” (type I) is important.

I also think managerial tasks such as “filling out quarterly reports in spreadsheets for management” can end up in this category.

Waste of processing itself includes building a feature that the customer doesn’t want. Quick feedback loops help here!

Kent Beck once stated:

“Make It Work, Make It Right, Make It Fast” (ref)

Making something maintainable is an example of waste if the solution doesn’t work. Making it performant is usually a waste if it is not maintainable.

Waste of excess inventory. This makes me think of “inventory” as code or managing issues:

All code is a liability and needs maintenance. Refactorings become more tedious the more code you have. Also, every single line you add to an application adds complexity. Removing code is a great way to remove bugs.

An issue-tracking system is where feature ideas go to die.

Managing issues/tickets (in Jira etc.) does not add any immediate customer value. The more tickets you have, the more time you spend labeling, sorting them, updating descriptions to tickets that might never happen, etc.

Waste of movement. In terms of software development, I think of this waste as a “waste of switching context”. This is a big one!

Having to switch between ways of communication can have a detrimental impact on productivity: Slack, e-mail, issue-tracking systems such as Jira, Github issues, wikis, Miro boards, document systems (Office365, Google Docs) & meetings. On an organizational level, I haven’t seen many companies trying to reduce these.

Unnecessary meetings are another form of unnecessary movement. If a meeting can happen through async communication, you don’t have to context switch as much.

Having a diverse set of technology can lead to a lot of context switching; switching between different frameworks, libraries, and infrastructure components. All of them with their different caveats and documentation. I could write a lot about this, but for now, I will simply refer to Radical Simplicity, Choosing Boring Tech, and reminding us to be aware of hype-driven development. The first Google SRE book also has a good chapter on simplicity in terms of reliability.

Many companies require a lot of movement in the process of developing software. Here are some:

Switching between a ticketing system, Github, terminal, and code editor.
Always having to create a ticket for every pull request.
Having to often switch between source code repositories. A monorepo can reduce or avoid this.
Needing to jump into a database to change things.
Needing to jump around in many different files to create a new package/microservice/API endpoint.
Releasing/packaging a library and then jumping to another place to start using it. I’m looking at you NPM, Maven, Gradle, et al… This is where a monorepo can shine.

Finally, certain companies require lots of movement to make a release to customers; making a release in one place and writing release notes in Slack or Jira, going through required checklists, making a second pull request to merge in production, clicking an extra button and waiting (waste!) to deploy to production… The more often you do something, the less movement you should strive for. It adds up over time…

Generally, a higher standardized set of movements to perform a task is usually better than constantly having to figure out which movements are needed to perform a task. For example, once you have standardized which steps are needed to create a microservice, you can take a more structured approach to reduce the steps. In other words, “Lead Time for Changes” variability is usually more important to reduce first before you take a stab at reducing the actual lead time.

Waste of making defective products. This is what we mostly call bugs, but it can also include bad UX experiences. Many people think of these defects primarily as immediate customer impact. They are, but there is also the secondary impact on velocity - constantly going back to fix bugs can have a detrimental impact on velocity.

Waste of underutilized workers. Not utilizing or growing engineering talent is also a waste. Making engineers ticket machines by not allowing them to take initiative or have a shared ownership of the product backlog can have detrimental effects on product innovation or product development effectiveness.

Underutilizing workers is a good example of where an organization’s culture can come into place. Generally, reducing waste in software development is a socio-technical problem needing to work within the spheres of improving processes, people management, and tech.