Generalization Takes Three Examples

As new programmers, we're often taught that one general solution is better than specific ones. It's more effective to write one function that solves multiple problems rather than separate functions for each problem.

This instinct is based on a prediction: we'll likely need to find signs of different colors in the future.

#You Aren't Gonna Need It

The code anticipates and solves for cases that rarely occur in reality. YAGNI (You Aren't Gonna Need It) refers that warns against implementing functionality before it is actually required.

Common scenarios leading to YAGNI violations:

Premature optimization: Adding complexity for performance reasons before identifying performance bottlenecks.
Over engineering: Building complex systems or adding abstractions for theoretical future needs.
Fear of future changes: Coding defensively for edge cases that may never arise.

Why avoid YAGNI:

Wasted effort: Time spent building unused features could have been used more effectively elsewhere.
Increased complexity: Unused code makes the codebase harder to understand and maintain.
Greater risk of bugs: More code means more potential for errors.

How to adhere to YAGNI:

Focus on current requirements: Build only what is needed to satisfy the current scope.
Iterative development: Add features incrementally as they become necessary.
Rely on refactoring: Trust that your codebase can evolve over time to accommodate new requirements.

#Adapting and dynamic code

You might ask, "Doesn't write code that just meets the requirements guarantee new use cases won't be handled? What do you do when a new use case doesn't fit the code?"

But no, that's not necessarily true. When a new use case arises, you can simply write code to handle it. You can copy and modify your first effort or start from scratch - both options are fine.

For example, the first use case was "Find a red sign." I wrote code to do just that.

The second use case was "Find a sign near Main Street and Barr Street." Now I'll write code to do just that.

The third use case was "Find a red sign near 212 South Water Street." This example is not handled by the previous two functions.

This is a turning point. With three separate use cases, we can make a more accurate prediction about the next two.

Why is three a significant number? Not really, except that it's more than one or two. One example isn't enough to identify a pattern. Two examples give us a false sense of security.

With three examples, our prediction will be more accurate, and we're less likely to make a mistake. We've learned to be humble after being wrong twice!

However, there's no obligation to generalize at this point. It's okay to write a new function that's separate from the first two.

This approach will work fine as long as your sign-finding use cases only involve checking for a single color and/or location. However, it won't scale well if you need to add more arguments. With seven permutations of the findSign function, it would become unwieldy.

You can merge the three findSign functions into one that handles all three cases. However, only do this if it makes the code easier to write and read based on the use cases you have in hand.

#The danger of generalization

You've seen that generalizing too soon can lead to writing code that's never used and is harder to adapt to new cases. This is partly because complicated code takes more work to change. There's also something more subtle that happens. Once you start generalizing, you're more likely to keep extending that idea instead of rethinking it.

Let's go back in time. You wrote the SignQuery class and then considered these use cases:

Find a red sign. Find a red "STOP" sign near Main Street and Barr Street. Find all red or green signs on Main Street. Find all white signs with "MPH" on Wabash Avenue or Water Street. Find a sign with "Lane" or colored blue near 902 Mill Street.

The first two cases fit SignQuery well, but things get tricky from there.

The third case, "Find all red or green signs on Main Street," adds two new requirements. First, it needs to return all matching signs, not just one. That's doable:

We need to find all signs along a street. This is more challenging. Let's assume streets are a series of line segments connecting locations. We can bundle locations and streets into a single "Area" structure.

The fourth use case asks for all speed-limit signs on either of two streets, which doesn’t fit. It’s easy enough to support a list of areas:

#Avoiding premature generalization

Generalized solutions can be sticky. Once you create an abstraction, it's hard to think of alternatives. You might want to use the same solution for all similar problems.

When you encounter a problem that doesn't fit, it's natural to want to extend your existing solution. But this can make the solution more complex over time. The danger is that you might not notice when your solution has gone too far.

When you have a hammer, everything looks like a nail. Creating a general solution can be like handing out hammers. Only do it when you're sure you have a real need.