Clean Code

April 02, 2021

Table of Contents

NOTE: TOC and these notes do not strictly correspond to the order used in the book itself.

Clean Code

Code is really the language in which we ultimately express the requirements.
It is unprofessional for programmers to bend to the will of managers who don’t understand the risks of making messes.
We are authors. Ratio of time spent reading vs. writing is well over 10:1. Making it easy to read actually makes it easier to write.
The Boy Scout rule: Leave the campground cleaner than you found it. Isn’t continuous improvement an intrinsic part of professionalism?

What Is Clean Code? (Answers from experts)

Bad code temps the mess to grow! When others change bad code; they tend to make it worse.
Clean code is focused. Each function, each class, each module exposes a single-minded attitude that remains entirely undistracted, and unpolluted, by the surrounding details.
Clean code can be read, and enhanced by a developer other than its original author. It has unit and acceptance tests. It has meaningful names. It provides one way rather than many ways for doing one thing. It has minimal dependencies, which are explicitly defined, and provides a clear and minimal API. Code should be literate since depending on the language, not all necessary information can be expressed clearly in code alone.
Runs all the tests; Contains no duplications; Expresses all the design ideas that are in the system; Minimizes the number of entities such as classes, methods, functions, and the like.

Prequel and Principles

Single Responsibility Principle: every module, class or a function should have responsibility over a single part of that program’s functionality and it should encapsulate that part.
Open Closed Principle: software entities (modules, classes, functions) should be open for extension but closed for modification.
Dependency Inversion Principle: Classes should depend on abstractions, not on concrete details.

Meaningful Names

Use Intention-Revealing Names. Choosing good names takes time but saves more time than it takes.
Avoid Disinformation
Meaningful Distinctions. It is not sufficient to add number series or noise words, even though the compiler is satisfied. If names must be different, then they should also mean something different.
Use Pronounceable Names. Intelligent conversion is now possible.
Use Searchable Names.
- Single-letter names and numeric constants have a particular problem that they are not easy to locate across a body of text.
- The length of a name should correspond to the size of its scope.
Avoid Encodings
- Hungarian Notation
- Member Prefixes
- I prefer to leave interfaces unadorned (no I prefix).
Avoid Mental Mapping. One difference between a smart programmer and a professional programmer is that the professional understands that clarity is king.
Class Names: nouns or noun phrase. Avoid names like Manager, Data, Professor, Info in the name of the class.
Method Names: verbs or verb phrase. Accessors, mutators, predicates should have get, set or is prefix.
Don’t Be Cute
- Don’t tell little culture-dependant jokes.
- Say what you mean. Mean what you say.
- Pick One Word per Concept. Using the same term for two different ideas is essentially a pun.

Functions

Duplication may be the root of all evil in software.
Writing clean software is like any other kind of writing. When you write a paper or an article you get your thoughts down first, then you massage it until it reads well.
Master programmers think of systems as stories to be told rather than programs to be written. They use the facilities of their chosen programming languages to construct a much richer and more expressive language that can be used to tell that story.
Your real goal is to tell the story of the system.

Small!

Functions should not be 100 lines long. Functions should be hardly ever 20 lines long.
Every function in this program was just two, or three, or four lines long. Each was transparently obvious. Each told a story. And each led you to the next in a compelling order.
The blocks within if, else, while etc. statements should be one line long.
The indent level of a function should not be greater than one or two.

Do One Thing

FUNCTIONS SHOULD DO ONE THING. THEY SHOULD DO IT WELL. THEY SHOULD DO IT ONLY.
Sections within Functions: This is an obvious symptom of doing more than one thing. Functions that do one thing cannot be reasonably divided into sections.
One Level of Abstraction per Function

Use Descriptive Names

The smaller and more focused a function is, the easier is to choose a descriptive name.
Don’t be afraid to make a name long. A long descriptive name is better than a short enigmatic name. A long descriptive name is better than a long descriptive comment.

Function Arguments

The ideal number of argument for a function is zero (niladic). Next comes one (monadic), followed closely by two (dyadic). Three arguments (triadic) should be avoided where possible. More than three (polyadic) requires very special justification - and then shouldn’t be used anyway.
Argument are hard. They take a lot of conceptual power.
Arguments are even harder from a testing point of view. If there are no arguments, this is trivial. If there’s one argument it’s not too hard.
assertEquals might be better written as assertExpectedEqualsActual(expected, actual). This strongly mitigates the problem of having to remember the ordering of the arguments.

Output Arguments

Are harder to understand than input arguments.
Using an output argument instead of a return value for a transformation is confusing. If a function is going to transform its input argument, then the transformation should appear as the return value.
In general output arguments should be avoided. If your function must change the state of something, have it change the state of its owning object (use OOP).

Flag Arguments

Flag arguments are ugly. Passing a boolean into a function is a truly terrible practice. It does one thing if the flag is true and another if the flag is false!

Argument Objects

Reducing the number of arguments by creating objects out of them may seem like cheating, but it’s not. Likely it is a part of a concept that deserves a name of its own.

Have No Side Effects

Side effects are lies. Your function promises to do one thing, but it also does another hidden thing.
Anything that forces you to check the function signature is equivalent to a double-take. It’s a cognitive break and should be avoided.

Command/Query Separation

Functions should either do something or answer something, but not both.

Prefer Exceptions to Returning Error Codes

Returning error codes from command function is a subtle violation of command query separation.
Extract Try/Catch Blocks into functions of their own.
Error Handling Is One Thing. Functions should do one thing. Thus, a function that handles errors should do nothing else. This implies that if the keyword try exists in a function, it should be the very first word in the function and that there should be nothing after the catch/finally blocks.

Comments

Don’t commend bad code - rewrite it.
The older a comment, and the farther away it is from the code it describes, the more likely it is to be just plain wrong. Programmers cant realistically maintain them.
Inaccurate comments are far worse than no comments at all.
Comments do not make up for bad code. “Ooh, I’d better comment that!” No! You’d better clean it!
Explain yourself in code:

// Check to see if the employee is eligible for full benefits
if ((employee.flags & HOURLY_FLAG) &&
(employee.age > 65))

vs.

if (employee.isEligibleForFullBenefits())

Good Comments

Legal Comments
Informative Comments
Explanation of intent
Clarification. Before writing, take care there is no better way and then take even more care that they are accurate.
Warning of Consequences
TODO comments. This should not be an excuse to leave bad code in the system.
Amplifications. Javadocs can be just as misleading, nonlocal, and dishonest as any other kind of comment.

Bad Comments

Mumbling
Redundant Comments
Misleading Comments
Mandated Comments. It’s plain silly to have a rule that says that every function must have a javadoc, or every variable must have a comment.
Journal Comments (see Tanzer’s code).
Noise Comments
Don’t use a comment when you can use a function or a variable.
Position Markers
Closing Brace Comments. Try to shorten your function instead.
Attributions and bylines
Commented-Out code
HTML Comments. It should be the responsibility of the tool that generates HTML documentation to transform them.
Nonlocal information.
Too much information. Don’t put interesting historical discussions or irrelevant descriptions of details into your comments.
Inobvious Connection
Function headers
Javadocs in nonpublic code

Formatting

Team should agree on a single set of formatting rules and all members should comply.
Have an automated tool that apply those formatting rules for you
Code formatting is about communication and communication is the professional developer’s first order of business.

Vertical Formatting

Small files are usually easier to understand than large files.
Vertical openness between concepts <-> vertical density.
Closely related concepts should be kept vertically close to each other.
Variable declarations should be declared as close to their usage as possible. Because our functions are very short, local variables should appear at the top of each function.
If one function calls another, they should be vertically close and the caller should be above the callee, if possible.

Horizontal Formatting

keep lines short. 100 or 120 chars. Beyond that it is probably just careless.
use horizontal white space to associate things that are strongly related and disassociate things that are more weakly related.
Horizontal alignment (see Tanzer) is not useful. It seems to emphasize the wrong things and may lead away the eye from the true intent.
If we have a long list that needs to be aligned the problem is the length of the list, not the lack of alignment.
Avoid collapsing scopes down to one line (if, while … in C/Java).
Good software system is composed of a set of documents that read nicely. They need to have a consistent and smooth style.

Objects and Data Structures

There’s a reason that we keep our variables private. We don’t want anyone else to depend on them.
Hiding implementation is about abstractions. We do not want to expose the details of our data. Rather we want to express our data in abstract terms.
Serious thought needs to be put into the best way to represent the data an object contains. The worst option is to blithely add getters and setters.

Data/Object Anti-Symmetry

Objects hide their data behind abstractions and expose functions that operate on that data. Data structures expose their data and have no meaningful functions.
OOP makes it easy to to add new classes without changing existing functions. Procedural code makes it easy to add new functions without changing existing data structures.
Understand this and choose the approach that is best for the job at hand.

Law of Demeteter

module should not know about the innards of the objects it manipulates.
Train Wrecks (a().b().c()): whether this is a violation of the Law of Demeter depends whether the internal structure should be hidden or exposed. If a, b, c, are just structures with no behaviour, Law of Demeter doesn’t apply.
Hybrids make it had to add new functions but also make it hard to add new data structures. Avoid creating them.
Hiding structure: If ctxt is an object, we should be telling it to do something; we should be be asking it about its internals.

Data Transfer Objects

DTO is a class with public variables and no functions
Beans have private variables and are accessed by getters/setters. The quasi-encapsulation provides no benefit.
Active Records are a special form of DTO with some navigational methods like save and find. Don’t put business logic in them (it creates a hybrid), treat an Active Record as a data structure and create objects that contain the business rules and hide their internal data (probably just instances of the Active Record).

Error Handling

Error handling is important, but if it obscures logic, it’s wrong.
Separate differed concerns, algorithm for … and error handling.
It’s good practice to start with a try-catch-finally when you’re writing code that could throw exceptions. try blocks are like transactions; catch leaves the program in consistent state.
Write tests that force exceptions, then add behaviour to your handler to satisfy your tests.
Use unchecked exceptions. The price of checked exceptions is Open/Closed Principle violation.
Provide context with exceptions - a stack trace can’t tell the intent of the operation that failed.
Wrap 3rd party APIs, incl. custom exception structure.
Don’t return null. when we do, we’re essentially creating work for ourselves and foisting problems upon our callers.
Don’t pass null: returning null from methods is bad, passing null into methods is worse. In most programing languages there is no good way to deal with a null that is passed by a caller incidentally. Therefore, the rational approach is to forbid passing null by default.

Boundaries

Do not pass Maps (or any other interface at a boundary) around your system. Wrap them.
Learnings tests - to check our understanding for new APIs. They are free (we need to learn the API anyway) and they allow us to check if the 3rd party packages are working as expected on new releases.
Good SW design accommodate change without huge investment and rework. We should avoid letting too much of our code know about the third party particulars. It is better to depends on something you control than on something you don’t control, lest it end up controlling you.

Unit Tests

Law 1: You may not write production code until you have written a failing unit test.
Law 2: You may not write more of a unit test than is sufficient to fail, and not compiling is failing.
Law 3: You may not write more production code than is sufficient to pass the currently failing test.

Keeping Tests Clean

Having dirty tests is equivalent to, if not worse than, having no tests.
Test code is just as important as production code. It is not a second-class citizen. It requires thought, design, and care. It must be kept clean as production code.

Clean Tests

What makes a clean test? Three things: Readability, readability, and readability. In unit tests readability is perhaps more important than it production code.
Readability means clarity, simplicity and density of expression.
Build-Operate-Check pattern. First part build ups the data, the second operates on that data, the third checks the operation yielded expected results.

Domain-Specific testing languages

Rather than using the APIs of the system directly we build up a set of functions and utilities that make use of those APIs and make tests more convenient to write and easier to read. These functions and utilities become a specialized API used by the tests.

A Dual Standard

The code within the testing API does have a different set of engineering standards than production code. It must be simple, succinct, and expressive, but it need not be as efficient as production code.

One Assert Per Test

The number of asserts in a test ought to be minimized.

Single Concept per Test

We want to test a single concept in each test function.

F.I.R.S.T.

Fast - tests should be fast
Independent - tests should not depend on each other
Repeatable - tests should operate in any environment. You should be able to run tests in the production environment, in the QA environment, and on your laptop while riding home on train without a network. If your tests aren’t repeatable in any environment, you’ll always have an excuse for why they fail.
Self-Validating - tests should have a boolean output. Either they pass or fail.
Timely - unit tests should be written just before the production code that makes them pass.

Classes

There is seldom a good reason to have a public variable.
We like to put private utilities called by a public function right after the public function itself.

Small!

The first rule of classes is that they should be small. The second rule of classes is that they should be smaller than that.
Naming helps: if we cannot derive a concise name fora class, than it’s likely too large.
Class names including weasel words like Processor or Manager or Supre often hint at unfortunate aggregation of responsibilities.
Getting software to work and making software clean are to very different activities.
The primary goal in managing complexity is to organize so that a developer knows where to look for things an need only understand the directly affected complexity at any given time.
We want our system to be composed of many small classes, not a few large ones. Each small class encapsulates a single responsibility, has a single reason to change, and collaborates with a few others to achieve the desired system behaviour.

Cohesion

Classes should have a small number of instance variables.
Generally the more variables a method manipulates the more cohesive that method is to its class.
Neither advisable to create a maximally cohesive classes, OTOH, we would like the cohesion to be high.

Organizing for Change

Isolating from Change

We want to structure out systems so that we muck with little as possible when we update with new or changed features. In an ideal system, we incorporate new features by extending the system, not by making modifications to existing code.
The lack of coupling means that elements of our system are better isolated from each other and from change.

Systems

Separate Constructing a System from Using It

Construction is a very different process from use.
SW systems should separate the startup process (main fct), when the application objects are constructed and the dependencies “wired” together, from the runtime logic that takes over after startup.
Factories
Dependency Injection. Inversion of Control (IoC) moves secondary responsibilities from an object to other objects that are dedicated to that purpose, therefore supporting the Single Responsibility Principle.

Scaling Up

It’s a myth that we can get systems “right at the first time”. Instead, we implement only today’s stories, then refactor and expand the system to implement new stories tomorrow. TDD, refactoring, clean code makes this work at the code level.
Software systems are unique compared to physical systems. Their architecture can grow incrementally, if we maintain the proper separation of concerns.

Pure Java AOP Frameworks

In Spring, you write your business logic as POJOs. POJOs are purely focuses on their domain. They have no dependencies on enterprise frameworks (or any other domain). They allow for truly test driving the application, without doing a Big Design Up Front.
We can start a SW project with a “naively simple” but nicely decoupled architecture, deliver working user stories quickly, then adding more infrastructure as we scale up.
A good API should largely disappear from the view most of the time, so the tam expends the majority of its creative efforts focused on the user stories being implemented. If not, then the architectural constraints will inhibit the efficient delivery of optimal value to the customer.

Optimize Decision Making

Postpone decisions until the last possible moment. This isn’t lazy or irresponsible; it lets us make informed choices with the best possible information.

Systems Need Domain-Specific Languages

If you are implementing domain logic in the same language that a domain expert uses, there is less risk that you will incorrectly translate the domain into implementation.

Conclusion

At all levels of abstraction, the intent should be clear.
Never forget to use the simplest thing that can possibly work.

Emergence

Is there a set of simple practices that can replace experience? Clearly not.

Rule 1: Runs All the Tests

Tight coupling makes it difficult to write tests.
OOP goal of low coupling and high cohesion. Writing tests leads to better designs.

Rule 2-4: Refactoring

No Duplication

Template-method helps

Expressive

Code should clearly express the intent of its author.
You can express yourself by choosing good names. We want to be able to hear a class or a function name and not be surprised when we discover its responsibilities.
You can also express yourself by keeping your functions and classes small. Small classes and functions are usually easy to name, easy to write, and easy to understand.
The most important way to be expressive is to try. Remember, the most likely next person to read that code will be you.
Take a little pride in your workmanship. Spend a little time with each of your functions and classes. Choose better names, split large functions into smaller functions and generally just take care of what you’ve created. Care is a precious resource.

Minimal Classes and Methods

High class and method counts are sometimes the result of pointless dogmatism.
Our goal is to keep our overall system small while we are also keeping our functions and classes small. Remember, however, that this rule is the lowest priority of the four rules of Simple Design. So, although it is important to keep class and function count low, it’s more important to have tests, eliminate duplication, and express yourself.

Concurrency

Objects are abstractions of processing. Threads are abstractions of schedule.
Decoupling what gets done from when it gets done. This can dramatically improve both the throughput and structure of application.

Writing clean concurrent program is hard - very hard.

The design of a concurrent algorithm can be remarkably different from the design of single-threaded system.
Concurrency incurs some overhead, in both performance as well as writing additional code.
Concurrency bugs aren’t usually repeatable.
Think about shut-down early and get it working early. It’s going to take longer than you expect.

Concurrency Defense Principles

Keep your concurrency-related code separate from other code.
Take data encapsulation to heart; severely limit the access of any data that may be shared.
Use copies of data. Collect results from multiple threads and then merge the results in a single thread.
Partition data into independent subsets that can be operated on by independent threads, possible in different processors.
Keep synchronized sections small.

Testing Threaded Code

Get you nonthreaded code working first.
Treat spurious failures as candidate threading issues. Don’t ignore system failures as one-offs.
Make your threaded code pluggable, so you can run it in various configurations.
Run with more threads than processors (cores), to encourage system switches.
Jiggle the code so that threads run in different orderings at different times. The combination of well-written tests and jiggling can dramatically increase the chance finding errors.

You’d be able to read the code from top to the bottom without a lot of jumping around or looking ahead.
Programming is more a craft than it is a science. To write clean code, you must first write dirty code and then clean it.
Most freshmen programmers think that the primary goal is to get the program working. Once it is “working” they move to the next task, leaving the program in whatever state they got it to “work”. Most seasoned programmers know this is professional suicide.
“If the structure of this code was ever going to be maintainable, now was the time to fix it. So I stopped adding features and started refactoring”.
Refactoring is a lot like solving a Rubik’s cube. There are a lot of little steps required to achieve a large goal. Each steps enables the next.
Bad schedules cna be redone, bad requirements can be redefined. Bad team dynamics can be repaired. Bad code rots and ferments, becoming an inexorable weight that drags the team down.

Refactoring SerialDate

It is only through critiques that we learn. Doctors do it. Layers do it. Pilots do it. And we programmers need to learn how to do it too.
First make it work. Get the coverage and extend/make the unit tests work.
Then make it right.
Get your code into a form that’s easy to change.

Smells and Heuristics

Comments

Inappropriate information
Obsolete comment
Redundant comment
Poorly written comment
Commented-out code

Environment

Build requires more than one step
Running tests requires more than one step

Functions

Too many arguments
Output arguments
Flag arguments
Not used (dead) functions

General

Multiple languages in one source file.
Obvious behaviour is unimplemented - Principle of least suprise.
Incorrect behaviour at the boundaries. Don’t rely on your intuition. Prove that your code works in all the corner cases.
Overridden safeties - turned of failing tests, compiler warnings etc.
Duplication.
Code at wrong level of abstraction. Constants, variables, utility functions hat pertain only to the detailed implementation should not be present in the base class. Don’t mix higher and lower level concepts together. You cannot lie or fake your way out of a misplaces abstraction. Isolating abstractions is one of the hardest things that software developers an do, and there is no quick fix when you get it wrong.
Base classes depending on their derivatives.
Too much information. Well-defined modules have very small interfaces that allow you to do a lot with a little. Hide your data, hide your utility functions, hide your constants and hide your temporaries. Don’t create classes with a lot of methods and instance variables. Don’t create lots of protected variables and functions for your subclasses. Help keeping low coupling by hiding information.
Dead code
Vertical separation
Inconsistency - goes back to the principle of least surprise.
Clutter.
Artificial coupling.
Feature envy. The methods of a class should be interested in the variables and functions of the class they belong to, and not the variables and functions of other classes.
Selector arguments. In general it is better to have many functions than to pass some code into a function to select behaviour.
Obscured intent.
Misplaced responsibility - code should be placed where a reader would naturally expect it to be.
Inappropriate static - there might be a reasonable chance that we’ll want the function to be polymorphic.
Use explanatory variables.
Function names should say what they do.
Understand the algorithm.
Make logical dependencies (i.e. assumptions, e.g. a constant) physical (calling a method providing the data).
Prefer polymorphism to If/Else or Switch/Case
Follow standard conventions - coding standards.
Replace magic numbers with named constants.
Be precise - don’t be lazy
- expecting the first match to be the only is naive
- don’t use float point numbers to represent currency
- use locks and/or TX
- don’t be too specific, e.g. declaring a variable as ArrayList when List is good enough
- making all variables protected by default is not constraining enough
Structure over convention, e.g. abstract methods > switch with nicely named enumerations.
Encapsulate conditionals. Turn if (expected == null || actual == null || areStringsEqual()) into a method -> bool.
Avoid negative conditionals.
Functions should do one thing.
Hidden temporal couplings -> make explicit.
Don’t be arbitrary.
Encapsulate boundary conditions. We don’t want swarms of +1s and -1s all over the code. Encapsulate then in variables.
Functions should descend only one level of abstraction.
Keep configurable data at high levels.
Avoid transitive navigation (Law of Demeter - write shy code).

Java

Avoid long imports by using wildcards.
Don’t inherit constants. Use static import instead.
Constants vs. enums - use enums.

Names

Choose descriptive names. Don’t be too quick to chose a name.
Choose names at the appropriate level of abstraction.
Use standard nomenclature where possible.
Unambiguous names.
Use long names for long scopes.
Avoid encodings.
Names should describe side-effects.

Tests

Insufficient tests.
Use a coverage tool - they report gaps in your testing strategy.
Don’t skip trivial tests.
An ignored test is a question about an ambiguity.
Test boundary conditions.
Exhaustive tests near bugs. Bugs ten to congregate. When you find a bug in a function, it is wise to do an exhaustive test of that function.
Test coverage patterns can be revealing.
Tests should past fast.

Clean Code

What Is Clean Code? (Answers from experts)

Prequel and Principles

Meaningful Names

Functions

Small!

Do One Thing

Use Descriptive Names

Function Arguments

Output Arguments

Flag Arguments

Argument Objects

Have No Side Effects

Command/Query Separation

Prefer Exceptions to Returning Error Codes

Comments

Good Comments

Bad Comments

Formatting

Vertical Formatting

Horizontal Formatting

Objects and Data Structures

Data/Object Anti-Symmetry

Law of Demeteter

Data Transfer Objects

Error Handling

Boundaries

Unit Tests

Keeping Tests Clean

Clean Tests

Domain-Specific testing languages

A Dual Standard

One Assert Per Test

Single Concept per Test

F.I.R.S.T.

Classes

Small!

Cohesion

Organizing for Change

Isolating from Change

Systems

Separate Constructing a System from Using It

Scaling Up

Pure Java AOP Frameworks

Optimize Decision Making

Systems Need Domain-Specific Languages

Conclusion

Emergence

Rule 1: Runs All the Tests

Rule 2-4: Refactoring

No Duplication

Expressive

Minimal Classes and Methods

Concurrency

Writing clean concurrent program is hard - very hard.

Concurrency Defense Principles

Testing Threaded Code

Successive Refinement

Refactoring SerialDate

Smells and Heuristics

Comments

Environment

Functions

General

Java

Names

Tests