From 7a4601dd5071abf4bbef490ab198e69a0edcf703 Mon Sep 17 00:00:00 2001 From: gdut-yy Date: Wed, 1 Jan 2020 17:11:49 +0800 Subject: [PATCH] =?UTF-8?q?md=20=E5=BC=95=E5=85=A5=E5=9B=BE=E7=89=87?= =?UTF-8?q?=E8=B5=84=E6=BA=90?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../.vuepress/public/figures/apA/322equ01.jpg | Bin 0 -> 1439 bytes docs/apA.md | 189 +++++++----------- docs/ch1.md | 58 +++--- docs/ch10.md | 2 +- docs/ch11.md | 8 +- docs/ch12.md | 13 +- docs/ch13.md | 107 ++++------ docs/ch14.md | 2 +- docs/ch15.md | 2 +- docs/ch16.md | 2 +- docs/ch17.md | 11 +- docs/ch2.md | 57 +++--- docs/ch3.md | 67 +++---- docs/ch4.md | 7 +- docs/ch5.md | 11 +- docs/ch6.md | 16 +- docs/ch7.md | 7 +- docs/ch8.md | 7 +- docs/ch9.md | 9 +- 19 files changed, 246 insertions(+), 329 deletions(-) create mode 100644 docs/.vuepress/public/figures/apA/322equ01.jpg diff --git a/docs/.vuepress/public/figures/apA/322equ01.jpg b/docs/.vuepress/public/figures/apA/322equ01.jpg new file mode 100644 index 0000000000000000000000000000000000000000..b1f4ed5399e645fdbf864a991fe274cb3a4303e4 GIT binary patch literal 1439 zcma)$XHe5=5XRqsLL?|H)Bs{jC?kSILYFRZbOa)iqJn^w3sLD1IEskqaY{+zA&MZP zpa`KS7>d%woDPYgKq4TZo@&~K?mVJbq6r}D?BL`8AQ%84 zn*~I(0CgztWKfEf7O1R^1jwh18zhk}sKO$QQh*NH9`SMg8xMXcS1oq0JC*6tDuH z$t}#Fg(FO3`&|LTG7{sKO8siqzW9)wY7I7`O}eFV4_mmx{qoV?*MO55)u~N_TWdww z{)*U=C6`fxjP<}hb9ZzzClzWqehjuN4cwFZh-5$HQcyQeGZLH{W0+%RDbxvWW({{a zGbh5V$_xG>=v;xY2y?d(*0|63^`u{gT|z`tczN&dGk8J~+5dq{;nkyLjXB-s_R*W+ zcZYo)Lf(7IpdH6~sVh<0F|3TC?)hpJZ1GX-a0Z<2`}704L|0p^3O2(c6riy|Ow2d4%6I&LRPIRu}a&DhdR3Db@dXU;x?<#|LPs&+; zljPCHk?L8dPfPaoTk*b7dMeW{?WF0+2;ME8mUE?aG_>|CCR}-1xiLW;p@kOet@ZGo zw2*-}e9ljs=Qmw^L7!D+93xPyOZAgmhD?jf{4hRce$!2+*~(tt=v}7C{L-eT=UR`v z-FOczs@n%V5gn8<#vDI(Ssvq)hb`a?^dw%b!o)22Xyy*es{|s>)yb9EsXK$@R;V#0 z^YeQT79oW}5aPs{$8rw!K627kEF2!s3Tkw$6qvmVzb1#(Fzokdu2^sNOy&jrp})|t zozJ~ZQ}r}pM)OYkhE2)3CG1#ht{&#?Jk(M`P_8!;!mwTs=_)oKxG5deagn|2;cEOg zf5qF38eQ9HROmjF*Rh_kPNh|HBvUi!>)3At8#;-77R%zzB|AEHNBnX`U6?n!-gVzf zIxL{^`I#l=@tFAu4HJ^vc}WK}KRry6fbPXdbmJ}%-tI=)>NUw``z_W)&DTQF?!1kf zwpHvnYr?Fr@?d;|EzyfnmSV~da@$5Xl&wGA4)1=ot?I@K&M)irx6pcvbWc*#s;tR+n?259_x&0MQX&hyPtpi zByUwd^C?SV1=C!iWD@xd=QUweDZn2SfuF9^UNQ!&^d*Ybw20mFf=Fo@r^w4K$~sau zhjSFv=a{65R|OG*QJ6c+%RXUEny-R+Tg>;>_N>Pf2luvVPU^iR#-vjX@12fwHPk86 zG}-HMgyfEoGh)YyOB>|fZ;iKXc+)>wq$J2b;XgNAQ%Y81bam3q?8RG8{XKPY& zpIjTH$z81KLz^jW^(wE`1Mkw&>=a$h*)X%Lx+nCNVsFbA?3U<*0fjDUSPsi{GTw?X z^rhn(9a3{zx}23-d*GADh!%ZvPIqR%@zhkJt;Z~Te`GC3Drxjhh+p55sv}$Y^%gum z@n=dFr|?GLU4z#b9v=>~bm{Nt^_@|T@kt910r4ThpX9N#oc4RS(ghU6>7AmHzW~@C BL;U~% literal 0 HcmV?d00001 diff --git a/docs/apA.md b/docs/apA.md index 703cf8e..7854342 100644 --- a/docs/apA.md +++ b/docs/apA.md @@ -49,9 +49,8 @@ This is a classic example of validating the throughput of a system. This system What happens if the test fails? Short of developing some kind of event polling loop, there is not much to do within a single thread that will make this code any faster. Will using multiple threads solve the problem? It might, but we need to know where the time is being spent. There are two possibilities: -• I/O—using a socket, connecting to a database, waiting for virtual memory swapping, and so on. - -• Processor—numerical calculations, regular expression processing, garbage collection, and so on. +- I/O—using a socket, connecting to a database, waiting for virtual memory swapping, and so on. +- Processor—numerical calculations, regular expression processing, garbage collection, and so on. Systems typically have some of each, but for a given operation one tends to dominate. If the code is processor bound, more processing hardware can improve throughput, making our test pass. But there are only so many CPU cycles available, so adding threads to a processor-bound problem will not make it go faster. @@ -91,13 +90,10 @@ How many threads might our server create? The code sets no limit, so the we coul But set the behavioral problem aside for the moment. The solution shown has problems of cleanliness and structure. How many responsibilities does the server code have? -• Socket connection management - -• Client processing - -• Threading policy - -• Server shutdown policy +- Socket connection management +- Client processing +- Threading policy +- Server shutdown policy Unfortunately, all these responsibilities live in the process function. In addition, the code crosses many different levels of abstraction. So, small as the process function is, it needs to be repartitioned. @@ -181,15 +177,13 @@ Review the method incrementValue, a one-line Java method with no looping or bran ``` Ignore integer overflow and assume that only one thread has access to a single instance of IdGenerator. In this case there is a single path of execution and a single guaranteed result: -• The value returned is equal to the value of lastIdUsed, both of which are one greater than just before calling the method. +- The value returned is equal to the value of lastIdUsed, both of which are one greater than just before calling the method. What happens if we use two threads and leave the method unchanged? What are the possible outcomes if each thread calls incrementValue once? How many possible paths of execution are there? First, the outcomes (assume lastIdUsed starts with a value of 93): -• Thread 1 gets the value of 94, thread 2 gets the value of 95, and lastIdUsed is now 95. - -• Thread 1 gets the value of 95, thread 2 gets the value of 94, and lastIdUsed is now 95. - -• Thread 1 gets the value of 94, thread 2 gets the value of 94, and lastIdUsed is now 94. +- Thread 1 gets the value of 94, thread 2 gets the value of 95, and lastIdUsed is now 95. +- Thread 1 gets the value of 95, thread 2 gets the value of 94, and lastIdUsed is now 95. +- Thread 1 gets the value of 94, thread 2 gets the value of 94, and lastIdUsed is now 94. The final result, while surprising, is possible. To see how these different results are possible, we need to understand the number of possible paths of execution and how the Java Virtual Machine executes them. @@ -200,7 +194,7 @@ To calculate the number of possible execution paths, we’ll start with the gene For this simple case of N instructions in a sequence, no looping or conditionals, and T threads, the total number of possible execution paths is equal to -Image +![](figures/apA/322equ01.jpg) Calculating the Possible Orderings @@ -255,17 +249,17 @@ What about the pre-increment operator, ++, on line 9? The pre-increment operator Before we go any further, here are three definitions that will be important: -• Frame—Every method invocation requires a frame. The frame includes the return address, any parameters passed into the method and the local variables defined in the method. This is a standard technique used to define a call stack, which is used by modern languages to allow for basic function/method invocation and to allow for recursive invocation. +- Frame—Every method invocation requires a frame. The frame includes the return address, any parameters passed into the method and the local variables defined in the method. This is a standard technique used to define a call stack, which is used by modern languages to allow for basic function/method invocation and to allow for recursive invocation. -• Local variable—Any variables defined in the scope of the method. All nonstatic methods have at least one variable, this, which represents the current object, the object that received the most recent message (in the current thread), which caused the method invocation. +- Local variable—Any variables defined in the scope of the method. All nonstatic methods have at least one variable, this, which represents the current object, the object that received the most recent message (in the current thread), which caused the method invocation. -• Operand stack—Many of the instructions in the Java Virtual Machine take parameters. The operand stack is where those parameters are put. The stack is a standard last-in, first-out (LIFO) data structure. +- Operand stack—Many of the instructions in the Java Virtual Machine take parameters. The operand stack is where those parameters are put. The stack is a standard last-in, first-out (LIFO) data structure. Here is the byte-code generated for resetId(): -Image +![](figures/apA/0324tab01.jpg) -Image +![](figures/apA/0325tab01.jpg) These three instructions are guaranteed to be atomic because, although the thread executing them could be interrupted after any one of them, the information for the PUTFIELD instruction (the constant value 0 on the top of the stack and the reference to this one below the top, along with the field value) cannot be touched by another thread. So when the assignment occurs, we are guaranteed that the value 0 will be stored in the field value. The operation is atomic. The operands all deal with information local to the method, so there is no interference between multiple threads. @@ -273,7 +267,7 @@ So if these three instructions are executed by ten threads, there are 4.38679733 With the ++ operation in the getNextId method, there are going to be problems. Assume that lastId holds 42 at the beginning of this method. Here is the byte-code for this new method: -Image +![](figures/apA/0325tab02.jpg) Imagine the case where the first thread completes the first three instructions, up to and including GETFIELD, and then it is interrupted. A second thread takes over and performs the entire method, incrementing lastId by one; it gets 43 back. Then the first thread picks up where it left off; 42 is still on the operand stack because that was the value of lastId when it executed GETFIELD. It adds one to get 43 again and stores the result. The value 43 is returned to the first thread as well. The result is that one of the increments is lost because the first thread stepped on the second thread after the second thread interrupted the first thread. @@ -284,11 +278,9 @@ An intimate understanding of byte-code is not necessary to understand how thread That being said, what this trivial example demonstrates is a need to understand the memory model enough to know what is and is not safe. It is a common misconception that the ++ (pre- or post-increment) operator is atomic, and it clearly is not. This means you need to know: -• Where there are shared objects/values - -• The code that can cause concurrent read/update issues - -• How to guard such concurrent issues from happening +- Where there are shared objects/values +- The code that can cause concurrent read/update issues +- How to guard such concurrent issues from happening KNOWING YOUR LIBRARY Executor Framework @@ -371,13 +363,10 @@ When a method attempts to update a shared variable, the CAS operation verifies t Nonthread-Safe Classes There are some classes that are inherently not thread safe. Here are a few examples: -• SimpleDateFormat - -• Database Connections - -• Containers in java.util - -• Servlets +- SimpleDateFormat +- Database Connections +- Containers in java.util +- Servlets Note that some collection classes have individual methods that are thread-safe. However, any operation that involves calling more than one method is not. For example, if you do not want to replace something in a HashTable because it is already there, you might write the following code: ```java @@ -387,14 +376,14 @@ Note that some collection classes have individual methods that are thread-safe. ``` Each individual method is thread-safe. However, another thread might add a value in between the containsKey and put calls. There are several options to fix this problem. -• Lock the HashTable first, and make sure all other users of the HashTable do the same—client-based locking: +- Lock the HashTable first, and make sure all other users of the HashTable do the same—client-based locking: ```java synchronized(map) { if(!map.conainsKey(key)) map.put(key, value); } ``` -• Wrap the HashTable in its own object and use a different API—server-based locking using an ADAPTER: +- Wrap the HashTable in its own object and use a different API—server-based locking using an ADAPTER: ```java public class WrappedHashtable { private Map map = new Hashtable(); @@ -405,7 +394,7 @@ Each individual method is thread-safe. However, another thread might add a value } } ``` -• Use the thread-safe collections: +- Use the thread-safe collections: ```java ConcurrentHashMap map = new ConcurrentHashMap(); @@ -448,11 +437,9 @@ This is a real problem and an example of the kinds of problems that crop up in c You have three options: -• Tolerate the failure. - -• Solve the problem by changing the client: client-based locking - -• Solve the problem by changing the server, which additionally changes the client: server-based locking +- Tolerate the failure. +- Solve the problem by changing the client: client-based locking +- Solve the problem by changing the server, which additionally changes the client: server-based locking Tolerate the Failure Sometimes you can set things up such that the failure causes no harm. For example, the above client could catch the exception and clean up. Frankly, this is a bit sloppy. It’s rather like cleaning up memory leaks by rebooting at midnight. @@ -522,19 +509,15 @@ In this case we actually change the API of our class to be multithread aware.3 T In general you should prefer server-based locking for these reasons: -• It reduces repeated code—Client-based locking forces each client to lock the server properly. By putting the locking code into the server, clients are free to use the object and not worry about writing additional locking code. - -• It allows for better performance—You can swap out a thread-safe server for a non-thread safe one in the case of single-threaded deployment, thereby avoiding all overhead. - -• It reduces the possibility of error—All it takes is for one programmer to forget to lock properly. - -• It enforces a single policy—The policy is in one place, the server, rather than many places, each client. - -• It reduces the scope of the shared variables—The client is not aware of them or how they are locked. All of that is hidden in the server. When things break, the number of places to look is smaller. +- It reduces repeated code—Client-based locking forces each client to lock the server properly. By putting the locking code into the server, clients are free to use the object and not worry about writing additional locking code. +- It allows for better performance—You can swap out a thread-safe server for a non-thread safe one in the case of single-threaded deployment, thereby avoiding all overhead. +- It reduces the possibility of error—All it takes is for one programmer to forget to lock properly. +- It enforces a single policy—The policy is in one place, the server, rather than many places, each client. +- It reduces the scope of the shared variables—The client is not aware of them or how they are locked. All of that is hidden in the server. When things break, the number of places to look is smaller. What if you do not own the server code? -• Use an ADAPTER to change the API and add locking +- Use an ADAPTER to change the API and add locking ```java public class ThreadSafeIntegerIterator { private IntegerIterator iterator = new IntegerIterator(); @@ -546,7 +529,7 @@ What if you do not own the server code? } } ``` -• OR better yet, use the thread-safe collections with extended interfaces +- OR better yet, use the thread-safe collections with extended interfaces INCREASING THROUGHPUT Let’s assume that we want to go out on the net and read the contents of a set of pages from a list of URLs. As each page is read, we will parse it to accumulate some statistics. Once all the pages are read, we will print a summary report. @@ -600,18 +583,16 @@ Notice that we’ve kept the synchronized block very small. It contains just the Single-Thread Calculation of Throughput Now lets do some simple calculations. For the purpose of argument, assume the following: -• I/O time to retrieve a page (average): 1 second - -• Processing time to parse page (average): .5 seconds - -• I/O requires 0 percent of the CPU while processing requires 100 percent. +- I/O time to retrieve a page (average): 1 second +- Processing time to parse page (average): .5 seconds +- I/O requires 0 percent of the CPU while processing requires 100 percent. For N pages being processed by a single thread, the total execution time is 1.5 seconds * N. Figure A-1 shows a snapshot of 13 pages or about 19.5 seconds. Figure A-1 Single thread -Image +![](figures/apA/x01-1single_thread.jpg) Multithread Calculation of Throughput If it is possible to retrieve pages in any order and process the pages independently, then it is possible to use multiple threads to increase throughput. What happens if we use three threads? How many pages can we acquire in the same time? @@ -621,30 +602,25 @@ As you can see in Figure A-2, the multithreaded solution allows the process-boun Figure A-2 Three concurrent threads -Image +![](figures/apA/x01-2multi_thread.jpg) DEADLOCK Imagine a Web application with two shared resource pools of some finite size: -• A pool of database connections for local work in process storage - -• A pool of MQ connections to a master repository +- A pool of database connections for local work in process storage +- A pool of MQ connections to a master repository Assume there are two operations in this application, create and update: -• Create—Acquire connection to master repository and database. Talk to service master repository and then store work in local work in process database. - -• Update—Acquire connection to database and then master repository. Read from work in process database and then send to the master repository +- Create—Acquire connection to master repository and database. Talk to service master repository and then store work in local work in process database. +- Update—Acquire connection to database and then master repository. Read from work in process database and then send to the master repository What happens when there are more users than the pool sizes? Consider each pool has a size of ten. -• Ten users attempt to use create, so all ten database connections are acquired, and each thread is interrupted after acquiring a database connection but before acquiring a connection to the master repository. - -• Ten users attempt to use update, so all ten master repository connections are acquired, and each thread is interrupted after acquiring the master repository but before acquiring a database connection. - -• Now the ten “create” threads must wait to acquire a master repository connection, but the ten “update” threads must wait to acquire a database connection. - -• Deadlock. The system never recovers. +- Ten users attempt to use create, so all ten database connections are acquired, and each thread is interrupted after acquiring a database connection but before acquiring a connection to the master repository. +- Ten users attempt to use update, so all ten master repository connections are acquired, and each thread is interrupted after acquiring the master repository but before acquiring a database connection. +- Now the ten “create” threads must wait to acquire a master repository connection, but the ten “update” threads must wait to acquire a database connection. +- Deadlock. The system never recovers. This might sound like an unlikely situation, but who wants a system that freezes solid every other week? Who wants to debug a system with symptoms that are so difficult to reproduce? This is the kind of problem that happens in the field, then takes weeks to solve. @@ -654,20 +630,16 @@ A typical “solution” is to introduce debugging statements to find out what i To really solve the problem of deadlock, we need to understand what causes it. There are four conditions required for deadlock to occur: -• Mutual exclusion - -• Lock & wait - -• No preemption - -• Circular wait +- Mutual exclusion +- Lock & wait +- No preemption +- Circular wait Mutual Exclusion Mutual exclusion occurs when multiple threads need to use the same resources and those resources -• Cannot be used by multiple threads at the same time. - -• Are limited in number. +- Cannot be used by multiple threads at the same time. +- Are limited in number. A common example of such a resource is a database connection, a file open for write, a record lock, or a semaphore. @@ -683,18 +655,16 @@ This is also referred to as the deadly embrace. Imagine two threads, T1 and T2, Figure A-3 -Image +![](figures/apA/x01-3breaking_cycle.jpg) All four of these conditions must hold for deadlock to be possible. Break any one of these conditions and deadlock is not possible. Breaking Mutual Exclusion One strategy for avoiding deadlock is to sidestep the mutual exclusion condition. You might be able to do this by -• Using resources that allow simultaneous use, for example, AtomicInteger. - -• Increasing the number of resources such that it equals or exceeds the number of competing threads. - -• Checking that all your resources are free before seizing any. +- Using resources that allow simultaneous use, for example, AtomicInteger. +- Increasing the number of resources such that it equals or exceeds the number of competing threads. +- Checking that all your resources are free before seizing any. Unfortunately, most resources are limited in number and don’t allow simultaneous use. And it’s not uncommon for the identity of the second resource to be predicated on the results of operating on the first. But don’t be discouraged; there are three conditions left. @@ -703,9 +673,8 @@ You can also eliminate deadlock if you refuse to wait. Check each resource befor This approach introduces several potential problems: -• Starvation—One thread keeps being unable to acquire the resources it needs (maybe it has a unique combination of resources that seldom all become available). - -• Livelock—Several threads might get into lockstep and all acquire one resource and then release one resource, over and over again. This is especially likely with simplistic CPU scheduling algorithms (think embedded devices or simplistic hand-written thread balancing algorithms). +- Starvation—One thread keeps being unable to acquire the resources it needs (maybe it has a unique combination of resources that seldom all become available). +- Livelock—Several threads might get into lockstep and all acquire one resource and then release one resource, over and over again. This is especially likely with simplistic CPU scheduling algorithms (think embedded devices or simplistic hand-written thread balancing algorithms). Both of these can cause poor throughput. The first results in low CPU utilization, whereas the second results in high and useless CPU utilization. @@ -723,9 +692,8 @@ In the example above with Thread 1 wanting both Resource 1 and Resource 2 and Th More generally, if all threads can agree on a global ordering of resources and if they all allocate resources in that order, then deadlock is impossible. Like all the other strategies, this can cause problems: -• The order of acquisition might not correspond to the order of use; thus a resource acquired at the start might not be used until the end. This can cause resources to be locked longer than strictly necessary. - -• Sometimes you cannot impose an order on the acquisition of resources. If the ID of the second resource comes from an operation performed on the first, then ordering is not feasible. +- The order of acquisition might not correspond to the order of use; thus a resource acquired at the start might not be used until the end. This can cause resources to be locked longer than strictly necessary. +- Sometimes you cannot impose an order on the acquisition of resources. If the ID of the second resource comes from an operation performed on the first, then ordering is not feasible. So there are many ways to avoid deadlock. Some lead to starvation, whereas others make heavy use of the CPU and reduce responsiveness. TANSTAAFL!5 @@ -746,13 +714,10 @@ How can we write a test to demonstrate the following code is broken? ``` Here’s a description of a test that will prove the code is broken: -• Remember the current value of nextId. - -• Create two threads, both of which call takeNextId() once. - -• Verify that nextId is two more than what we started with. - -• Run this until we demonstrate that nextId was only incremented by one instead of two. +- Remember the current value of nextId. +- Create two threads, both of which call takeNextId() once. +- Verify that nextId is two more than what we started with. +- Run this until we demonstrate that nextId was only incremented by one instead of two. Listing A-2 shows such a test: @@ -798,9 +763,9 @@ Listing A-2 ClassWithThreadingProblemTest.java 36: } 37: } ``` -Image +![](figures/apA/0340tab01.jpg) -Image +![](figures/apA/0341tab01.jpg) This test certainly sets up the conditions for a concurrent update problem. However, the problem occurs so infrequently that the vast majority of times this test won’t detect it. @@ -814,15 +779,15 @@ So what approaches can we take to demonstrate this simple failure? And, more imp Here are a few ideas: -• Monte Carlo Testing. Make tests flexible, so they can be tuned. Then run the test over and over—say on a test server—randomly changing the tuning values. If the tests ever fail, the code is broken. Make sure to start writing those tests early so a continuous integration server starts running them soon. By the way, make sure you carefully log the conditions under which the test failed. +- Monte Carlo Testing. Make tests flexible, so they can be tuned. Then run the test over and over—say on a test server—randomly changing the tuning values. If the tests ever fail, the code is broken. Make sure to start writing those tests early so a continuous integration server starts running them soon. By the way, make sure you carefully log the conditions under which the test failed. -• Run the test on every one of the target deployment platforms. Repeatedly. Continuously. The longer the tests run without failure, the more likely that +- Run the test on every one of the target deployment platforms. Repeatedly. Continuously. The longer the tests run without failure, the more likely that – The production code is correct or – The tests aren’t adequate to expose problems. -• Run the tests on a machine with varying loads. If you can simulate loads close to a production environment, do so. +- Run the tests on a machine with varying loads. If you can simulate loads close to a production environment, do so. Yet, even if you do all of these things, you still don’t stand a very good chance of finding threading problems with your code. The most insidious problems are the ones that have such a small cross section that they only occur once in a billion opportunities. Such problems are the terror of complex systems. @@ -835,11 +800,9 @@ We do not have any direct relationship with IBM or the team that developed ConTe Here’s an outline of how to use ConTest: -• Write tests and production code, making sure there are tests specifically designed to simulate multiple users under varying loads, as mentioned above. - -• Instrument test and production code with ConTest. - -• Run the tests. +- Write tests and production code, making sure there are tests specifically designed to simulate multiple users under varying loads, as mentioned above. +- Instrument test and production code with ConTest. +- Run the tests. When we instrumented code with ConTest, our success rate went from roughly one failure in ten million iterations to roughly one failure in thirty iterations. Here are the loop values for several runs of the test after instrumentation: 13, 23, 0, 54, 16, 14, 6, 69, 107, 49, 2. So clearly the instrumented classes failed much earlier and with much greater reliability. diff --git a/docs/ch1.md b/docs/ch1.md index 3415704..7e9c7ee 100644 --- a/docs/ch1.md +++ b/docs/ch1.md @@ -1,13 +1,12 @@ # 第 1 章 Clean Code -Image -Image +![](figures/ch1/1-1fig_martin.jpg) You are reading this book for two reasons. First, you are a programmer. Second, you want to be a better programmer. Good. We need better programmers. This is a book about good programming. It is filled with code. We are going to look at code from every different direction. We’ll look down at it from the top, up at it from the bottom, and through it from the inside out. By the time we are done, we’re going to know a lot about code. What’s more, we’ll be able to tell the difference between good code and bad code. We’ll know how to write good code. And we’ll know how to transform bad code into good code. -THERE WILL BE CODE +## THERE WILL BE CODE One might argue that a book about code is somehow behind the times—that code is no longer the issue; that we should be concerned about models and requirements instead. Indeed some have suggested that we are close to the end of code. That soon all code will be generated instead of written. That programmers simply won’t be needed because business people will generate programs from specifications. Nonsense! We will never be rid of code, because code represents the details of the requirements. At some level those details cannot be ignored or abstracted; they have to be specified. And specifying requirements in such detail that a machine can execute them is programming. Such a specification is code. @@ -20,14 +19,14 @@ This will never happen. Not even humans, with all their intuition and creativity Remember that code is really the language in which we ultimately express the requirements. We may create languages that are closer to the requirements. We may create tools that help us parse and assemble those requirements into formal structures. But we will never eliminate necessary precision—so there will always be code. -BAD CODE +## BAD CODE I was recently reading the preface to Kent Beck’s book Implementation Patterns.1 He says, “… this book is based on a rather fragile premise: that good code matters….” A fragile premise? I disagree! I think that premise is one of the most robust, supported, and overloaded of all the premises in our craft (and I think Kent knows it). We know good code matters because we’ve had to deal for so long with its lack. 1. [Beck07]. I know of one company that, in the late 80s, wrote a killer app. It was very popular, and lots of professionals bought and used it. But then the release cycles began to stretch. Bugs were not repaired from one release to the next. Load times grew and crashes increased. I remember the day I shut the product down in frustration and never used it again. The company went out of business a short time after that. -Image +![](figures/ch1/1-2fig_martin.jpg) Two decades later I met one of the early employees of that company and asked him what had happened. The answer confirmed my fears. They had rushed the product to market and had made a huge mess in the code. As they added more and more features, the code got worse and worse until they simply could not manage it any longer. It was the bad code that brought the company down. @@ -39,7 +38,7 @@ Were you trying to go fast? Were you in a rush? Probably so. Perhaps you felt th We’ve all looked at the mess we’ve just made and then have chosen to leave it for another day. We’ve all felt the relief of seeing our messy program work and deciding that a working mess is better than nothing. We’ve all said we’d go back and clean it up later. Of course, in those days we didn’t know LeBlanc’s law: Later equals never. -THE TOTAL COST OF OWNING A MESS +## THE TOTAL COST OF OWNING A MESS If you have been a programmer for more than two or three years, you have probably been significantly slowed down by someone else’s messy code. If you have been a programmer for longer than two or three years, you have probably been slowed down by messy code. The degree of the slowdown can be significant. Over the span of a year or two, teams that were moving very fast at the beginning of a project can find themselves moving at a snail’s pace. Every change they make to the code breaks two or three other parts of the code. No change is trivial. Every addition or modification to the system requires that the tangles, twists, and knots be “understood” so that more tangles, twists, and knots can be added. Over time the mess becomes so big and so deep and so tall, they can not clean it up. There is no way at all. As the mess builds, the productivity of the team continues to decrease, asymptotically approaching zero. As productivity decreases, management does the only thing they can; they add more staff to the project in hopes of increasing productivity. But that new staff is not versed in the design of the system. They don’t know the difference between a change that matches the design intent and a change that thwarts the design intent. Furthermore, they, and everyone else on the team, are under horrific pressure to increase productivity. So they all make more and more messes, driving the productivity ever further toward zero. (See Figure 1-1.) @@ -47,9 +46,9 @@ As the mess builds, the productivity of the team continues to decrease, asymptot Figure 1-1 Productivity vs. time -Image +![](figures/ch1/1-4fig_martin.jpg) -The Grand Redesign in the Sky +## The Grand Redesign in the Sky Eventually the team rebels. They inform management that they cannot continue to develop in this odious code base. They demand a redesign. Management does not want to expend the resources on a whole new redesign of the project, but they cannot deny that productivity is terrible. Eventually they bend to the demands of the developers and authorize the grand redesign in the sky. A new tiger team is selected. Everyone wants to be on this team because it’s a green-field project. They get to start over and create something truly beautiful. But only the best and brightest are chosen for the tiger team. Everyone else must continue to maintain the current system. @@ -60,7 +59,7 @@ This race can go on for a very long time. I’ve seen it take 10 years. And by t If you have experienced even one small part of the story I just told, then you already know that spending time keeping your code clean is not just cost effective; it’s a matter of professional survival. -Attitude +## Attitude Have you ever waded through a mess so grave that it took weeks to do what should have taken hours? Have you seen what should have been a one-line change, made instead in hundreds of different modules? These symptoms are all too common. Why does this happen to code? Why does good code rot so quickly into bad code? We have lots of explanations for it. We complain that the requirements changed in ways that thwart the original design. We bemoan the schedules that were too tight to do things right. We blather about stupid managers and intolerant customers and useless marketing types and telephone sanitizers. But the fault, dear Dilbert, is not in our stars, but in ourselves. We are unprofessional. @@ -77,12 +76,12 @@ To drive this point home, what if you were a doctor and had a patient who demand So too it is unprofessional for programmers to bend to the will of managers who don’t understand the risks of making messes. -The Primal Conundrum +## The Primal Conundrum Programmers face a conundrum of basic values. All developers with more than a few years experience know that previous messes slow them down. And yet all developers feel the pressure to make messes in order to meet deadlines. In short, they don’t take the time to go fast! True professionals know that the second part of the conundrum is wrong. You will not make the deadline by making the mess. Indeed, the mess will slow you down instantly, and will force you to miss the deadline. The only way to make the deadline—the only way to go fast—is to keep the code as clean as possible at all times. -The Art of Clean Code? +## The Art of Clean Code? Let’s say you believe that messy code is a significant impediment. Let’s say that you accept that the only way to go fast is to keep your code clean. Then you must ask yourself: “How do I write clean code?” It’s no good trying to write clean code if you don’t know what it means for code to be clean! The bad news is that writing clean code is a lot like painting a picture. Most of us know when a picture is painted well or badly. But being able to recognize good art from bad does not mean that we know how to paint. So too being able to recognize clean code from dirty code does not mean that we know how to write clean code! @@ -93,10 +92,10 @@ A programmer without “code-sense” can look at a messy module and recognize t In short, a programmer who writes clean code is an artist who can take a blank screen through a series of transformations until it is an elegantly coded system. -What Is Clean Code? +## What Is Clean Code? There are probably as many definitions as there are programmers. So I asked some very well-known and deeply experienced programmers what they thought. -Image +![](figures/ch1/1-5fig_martin.jpg) Bjarne Stroustrup, inventor of C++ and author of The C++ Programming Language @@ -116,7 +115,7 @@ Bjarne closes with the assertion that clean code does one thing well. It is no a Grady Booch, author of Object Oriented Analysis and Design with Applications -Image +![](figures/ch1/1-6fig_martin.jpg) Clean code is simple and direct. Clean code reads like well-written prose. Clean code never obscures the designer’s intent but rather is full of crisp abstractions and straightforward lines of control. @@ -128,7 +127,7 @@ I find Grady’s use of the phrase “crisp abstraction” to be a fascinating o “Big” Dave Thomas, founder of OTI, godfather of the Eclipse strategy -Image +![](figures/ch1/1-7fig_martin.jpg) Clean code can be read, and enhanced by a developer other than its original author. It has unit and acceptance tests. It has meaningful names. It provides one way rather than many ways for doing one thing. It has minimal dependencies, which are explicitly defined, and provides a clear and minimal API. Code should be literate since depending on the language, not all necessary information can be expressed clearly in code alone. @@ -144,7 +143,7 @@ Dave also says that code should be literate. This is a soft reference to Knuth Michael Feathers, author of Working Effectively with Legacy Code -Image +![](figures/ch1/1-8fig_martin.jpg) I could list all of the qualities that I notice in clean code, but there is one overarching quality that leads to all of them. Clean code always looks like it was written by someone who cares. There is nothing obvious that you can do to make it better. All of those things were thought about by the code’s author, and if you try to imagine improvements, you’re led back to where you are, sitting in appreciation of the code someone left for you—code left by someone who cares deeply about the craft. @@ -156,17 +155,14 @@ Ron Jeffries, author of Extreme Programming Installed and Extreme Programming Ad Ron began his career programming in Fortran at the Strategic Air Command and has written code in almost every language and on almost every machine. It pays to consider his words carefully. -Image +![](figures/ch1/1-9fig_martin.jpg) In recent years I begin, and nearly end, with Beck’s rules of simple code. In priority order, simple code: -• Runs all the tests; - -• Contains no duplication; - -• Expresses all the design ideas that are in the system; - -• Minimizes the number of entities such as classes, methods, functions, and the like. +- Runs all the tests; +- Contains no duplication; +- Expresses all the design ideas that are in the system; +- Minimizes the number of entities such as classes, methods, functions, and the like. Of these, I focus mostly on duplication. When the same thing is done over and over, it’s a sign that there is an idea in our mind that is not well represented in the code. I try to figure out what it is. Then I try to express that idea more clearly. @@ -186,7 +182,7 @@ Here, in a few short paragraphs, Ron has summarized the contents of this book. N Ward Cunningham, inventor of Wiki, inventor of Fit, coinventor of eXtreme Programming. Motive force behind Design Patterns. Smalltalk and OO thought leader. The godfather of all those who care about code. -Image +![](figures/ch1/1-10fig_martin.jpg) You know you are working on clean code when each routine you read turns out to be pretty much what you expected. You can call it beautiful code when the code also makes it look like the language was made for the problem. @@ -198,10 +194,10 @@ Ward expects that when you read clean code you won’t be surprised at all. Inde And what about Ward’s notion of beauty? We’ve all railed against the fact that our languages weren’t designed for our problems. But Ward’s statement puts the onus back on us. He says that beautiful code makes the language look like it was made for the problem! So it’s our responsibility to make the language look simple! Language bigots everywhere, beware! It is not the language that makes programs appear simple. It is the programmer that make the language appear simple! -SCHOOLS OF THOUGHT +## SCHOOLS OF THOUGHT What about me (Uncle Bob)? What do I think clean code is? This book will tell you, in hideous detail, what I and my compatriots think about clean code. We will tell you what we think makes a clean variable name, a clean function, a clean class, etc. We will present these opinions as absolutes, and we will not apologize for our stridence. To us, at this point in our careers, they are absolutes. They are our school of thought about clean code. -Image +![](figures/ch1/1-11fig_martin.jpg) Martial artists do not all agree about the best martial art, or the best technique within a martial art. Often master martial artists will form their own schools of thought and gather students to learn from them. So we see Gracie Jiu Jistu, founded and taught by the Gracie family in Brazil. We see Hakkoryu Jiu Jistu, founded and taught by Okuyama Ryuho in Tokyo. We see Jeet Kune Do, founded and taught by Bruce Lee in the United States. @@ -213,7 +209,7 @@ Consider this book a description of the Object Mentor School of Clean Code. The Indeed, many of the recommendations in this book are controversial. You will probably not agree with all of them. You might violently disagree with some of them. That’s fine. We can’t claim final authority. On the other hand, the recommendations in this book are things that we have thought long and hard about. We have learned them through decades of experience and repeated trial and error. So whether you agree or disagree, it would be a shame if you did not see, and respect, our point of view. -WE ARE AUTHORS +## WE ARE AUTHORS The @author field of a Javadoc tells us who we are. We are authors. And one thing about authors is that they have readers. Indeed, authors are responsible for communicating well with their readers. The next time you write a line of code, remember you are an author, writing for readers who will judge your effort. You might ask: How much is code really read? Doesn’t most of the effort go into writing it? @@ -245,7 +241,7 @@ Because this ratio is so high, we want the reading of code to be easy, even if i There is no escape from this logic. You cannot write code if you cannot read the surrounding code. The code you are trying to write today will be hard or easy to write depending on how hard or easy the surrounding code is to read. So if you want to go fast, if you want to get done quickly, if you want your code to be easy to write, make it easy to read. -THE BOY SCOUT RULE +## THE BOY SCOUT RULE It’s not enough to write the code well. The code has to be kept clean over time. We’ve all seen code rot and degrade as time passes. So we must take an active role in preventing this degradation. The Boy Scouts of America have a simple rule that we can apply to our profession. @@ -258,12 +254,12 @@ If we all checked-in our code a little cleaner than when we checked it out, the Can you imagine working on a project where the code simply got better as time passed? Do you believe that any other option is professional? Indeed, isn’t continuous improvement an intrinsic part of professionalism? -PREQUEL AND PRINCIPLES +## PREQUEL AND PRINCIPLES In many ways this book is a “prequel” to a book I wrote in 2002 entitled Agile Software Development: Principles, Patterns, and Practices (PPP). The PPP book concerns itself with the principles of object-oriented design, and many of the practices used by professional developers. If you have not read PPP, then you may find that it continues the story told by this book. If you have already read it, then you’ll find many of the sentiments of that book echoed in this one at the level of code. In this book you will find sporadic references to various principles of design. These include the Single Responsibility Principle (SRP), the Open Closed Principle (OCP), and the Dependency Inversion Principle (DIP) among others. These principles are described in depth in PPP. -CONCLUSION +## CONCLUSION Books on art don’t promise to make you an artist. All they can do is give you some of the tools, techniques, and thought processes that other artists have used. So too this book cannot promise to make you a good programmer. It cannot promise to give you “code-sense.” All it can do is show you the thought processes of good programmers and the tricks, techniques, and tools that they use. Just like a book on art, this book will be full of details. There will be lots of code. You’ll see good code and you’ll see bad code. You’ll see bad code transformed into good code. You’ll see lists of heuristics, disciplines, and techniques. You’ll see example after example. After that, it’s up to you. diff --git a/docs/ch10.md b/docs/ch10.md index 0a39b2b..1bd3fd8 100644 --- a/docs/ch10.md +++ b/docs/ch10.md @@ -1,7 +1,7 @@ # 第 10 章 Classes with Jeff Langr -Image +![](figures/ch10/10_1fig_martin.jpg) So far in this book we have focused on how to write lines and blocks of code well. We have delved into proper composition of functions and how they interrelate. But for all the attention to the expressiveness of code statements and the functions they comprise, we still don’t have clean code until we’ve paid attention to higher levels of code organization. Let’s talk about clean classes. diff --git a/docs/ch11.md b/docs/ch11.md index 4bb2cc8..9f4356f 100644 --- a/docs/ch11.md +++ b/docs/ch11.md @@ -1,7 +1,7 @@ # 第 11 章 Systems by Dr. Kevin Dean Wampler -Image +![](figures/ch11/11_1fig_martin.jpg) “Complexity kills. It sucks the life out of developers, it makes products difficult to plan, build, and test.” @@ -54,7 +54,7 @@ Sometimes, of course, we need to make the application responsible for when an ob Figure 11-1 Separating construction in main() -Image +![](figures/ch11/11_2fig_martin.jpg) LineItem instances to add to an Order. In this case we can use the ABSTRACT FACTORY2 pattern to give the application control of when to build the LineItems, but keep the details of that construction separate from the application code. (See Figure 11-2.) @@ -63,7 +63,7 @@ LineItem instances to add to an Order. In this case we can use the ABSTRACT FACT Figure 11-2 Separation construction with factory -Image +![](figures/ch11/11_3fig_martin.jpg) Again notice that all the dependencies point from main toward the OrderProcessing application. This means that the application is decoupled from the details of how to build a LineItem. That capability is held in the LineItemFactoryImplementation, which is on the main side of the line. And yet the application is in complete control of when the LineItem instances get built and can even provide application-specific constructor arguments. @@ -328,7 +328,7 @@ Each “bean” is like one part of a nested “Russian doll,” with a domain o Figure 11-3 The “Russian doll” of decorators -Image +![](figures/ch11/11_4fig_martin.jpg) The client believes it is invoking getAccounts() on a Bank object, but it is actually talking to the outermost of a set of nested DECORATOR14 objects that extend the basic behavior of the Bank POJO. We could add other decorators for transactions, caching, and so forth. diff --git a/docs/ch12.md b/docs/ch12.md index 0eb6fca..94b733f 100644 --- a/docs/ch12.md +++ b/docs/ch12.md @@ -1,7 +1,7 @@ # 第 12 章 Emergence by Jeff Langr -Image +![](figures/ch12/12_1fig_martin.jpg) GETTING CLEAN VIA EMERGENT DESIGN What if there were four simple rules that you could follow that would help you create good designs as you worked? What if by following these rules you gained insights into the structure and design of your code, making it easier to apply principles such as SRP and DIP? What if these four rules facilitated the emergence of good designs? @@ -12,13 +12,10 @@ Many of us feel that Kent Beck’s four rules of Simple Design1 are of significa According to Kent, a design is “simple” if it follows these rules: -• Runs all the tests - -• Contains no duplication - -• Expresses the intent of the programmer - -• Minimizes the number of classes and methods +- Runs all the tests +- Contains no duplication +- Expresses the intent of the programmer +- Minimizes the number of classes and methods The rules are given in order of importance. diff --git a/docs/ch13.md b/docs/ch13.md index 24b0516..cf0fdd4 100644 --- a/docs/ch13.md +++ b/docs/ch13.md @@ -1,7 +1,7 @@ # 第 13 章 Concurrency by Brett L. Schuchert -Image +![](figures/ch13/13_1fig_martin.jpg) “Objects are abstractions of processing. Threads are abstractions of schedule.” @@ -33,26 +33,24 @@ Or consider a system that interprets large data sets but can only give a complet Myths and Misconceptions And so there are compelling reasons to adopt concurrency. However, as we said before, concurrency is hard. If you aren’t very careful, you can create some very nasty situations. Consider these common myths and misconceptions: -• Concurrency always improves performance. +- Concurrency always improves performance. Concurrency can sometimes improve performance, but only when there is a lot of wait time that can be shared between multiple threads or multiple processors. Neither situation is trivial. -• Design does not change when writing concurrent programs. +- Design does not change when writing concurrent programs. In fact, the design of a concurrent algorithm can be remarkably different from the design of a single-threaded system. The decoupling of what from when usually has a huge effect on the structure of the system. -• Understanding concurrency issues is not important when working with a container such as a Web or EJB container. +- Understanding concurrency issues is not important when working with a container such as a Web or EJB container. In fact, you’d better know just what your container is doing and how to guard against the issues of concurrent update and deadlock described later in this chapter. Here are a few more balanced sound bites regarding writing concurrent software: -• Concurrency incurs some overhead, both in performance as well as writing additional code. - -• Correct concurrency is complex, even for simple problems. - -• Concurrency bugs aren’t usually repeatable, so they are often ignored as one-offs2 instead of the true defects they are. +- Concurrency incurs some overhead, both in performance as well as writing additional code. +- Correct concurrency is complex, even for simple problems. +- Concurrency bugs aren’t usually repeatable, so they are often ignored as one-offs2 instead of the true defects they are. 2. Cosmic-rays, glitches, and so on. -• Concurrency often requires a fundamental change in design strategy. +- Concurrency often requires a fundamental change in design strategy. CHALLENGES What makes concurrent programming so difficult? Consider the following trivial class: @@ -67,11 +65,11 @@ What makes concurrent programming so difficult? Consider the following trivial c ``` Let’s say we create an instance of X, set the lastIdUsed field to 42, and then share the instance between two threads. Now suppose that both of those threads call the method getNextId(); there are three possible outcomes: -• Thread one gets the value 43, thread two gets the value 44, lastIdUsed is 44. +- Thread one gets the value 43, thread two gets the value 44, lastIdUsed is 44. -• Thread one gets the value 44, thread two gets the value 43, lastIdUsed is 44. +- Thread one gets the value 44, thread two gets the value 43, lastIdUsed is 44. -• Thread one gets the value 43, thread two gets the value 43, lastIdUsed is 43. +- Thread one gets the value 43, thread two gets the value 43, lastIdUsed is 43. The surprising third result3 occurs when the two threads step on each other. This happens because there are many possible paths that the two threads can take through that one line of Java code, and some of those paths generate incorrect results. How many different paths are there? To really answer that question, we need to understand what the Just-In-Time Compiler does with the generated byte-code, and understand what the Java memory model considers to be atomic. @@ -89,11 +87,9 @@ The SRP5 states that a given method/class/component should have a single reason 5. [PPP] -• Concurrency-related code has its own life cycle of development, change, and tuning. - -• Concurrency-related code has its own challenges, which are different from and often more difficult than nonconcurrency-related code. - -• The number of ways in which miswritten concurrency-based code can fail makes it challenging enough without the added burden of surrounding application code. +- Concurrency-related code has its own life cycle of development, change, and tuning. +- Concurrency-related code has its own challenges, which are different from and often more difficult than nonconcurrency-related code. +- The number of ways in which miswritten concurrency-based code can fail makes it challenging enough without the added burden of surrounding application code. Recommendation: Keep your concurrency-related code separate from other code.6 @@ -102,13 +98,12 @@ Recommendation: Keep your concurrency-related code separate from other code.6 Corollary: Limit the Scope of Data As we saw, two threads modifying the same field of a shared object can interfere with each other, causing unexpected behavior. One solution is to use the synchronized keyword to protect a critical section in the code that uses the shared object. It is important to restrict the number of such critical sections. The more places shared data can get updated, the more likely: -• You will forget to protect one or more of those places—effectively breaking all code that modifies that shared data. - -• There will be duplication of effort required to make sure everything is effectively guarded (violation of DRY7). +- You will forget to protect one or more of those places—effectively breaking all code that modifies that shared data. +- There will be duplication of effort required to make sure everything is effectively guarded (violation of DRY7). 7. [PRAG]. -• It will be difficult to determine the source of failures, which are already hard enough to find. +- It will be difficult to determine the source of failures, which are already hard enough to find. Recommendation: Take data encapsulation to heart; severely limit the access of any data that may be shared. @@ -127,13 +122,10 @@ Recommendation: Attempt to partition data into independent subsets than can be o KNOW YOUR LIBRARY Java 5 offers many improvements for concurrent development over previous versions. There are several things to consider when writing threaded code in Java 5: -• Use the provided thread-safe collections. - -• Use the executor framework for executing unrelated tasks. - -• Use nonblocking solutions when possible. - -• Several library classes are not thread safe. +- Use the provided thread-safe collections. +- Use the executor framework for executing unrelated tasks. +- Use nonblocking solutions when possible. +- Several library classes are not thread safe. Thread-Safe Collections When Java was young, Doug Lea wrote the seminal book8 Concurrent Programming in Java. Along with the book he developed several thread-safe collections, which later became part of the JDK in the java.util.concurrent package. The collections in that package are safe for multithreaded situations and they perform well. In fact, the ConcurrentHashMap implementation performs better than HashMap in nearly all situations. It also allows for simultaneous concurrent reads and writes, and it has methods supporting common composite operations that are otherwise not thread safe. If Java 5 is the deployment environment, start with ConcurrentHashMap. @@ -142,14 +134,14 @@ When Java was young, Doug Lea wrote the seminal book8 Concurrent Programming in There are several other kinds of classes added to support advanced concurrency design. Here are a few examples: -image +![](figures/ch13/t0183-01.jpg) Recommendation: Review the classes available to you. In the case of Java, become familiar with java.util.concurrent, java.util.concurrent.atomic, java.util.concurrent.locks. KNOW YOUR EXECUTION MODELS There are several different ways to partition behavior in a concurrent application. To discuss them we need to understand some basic definitions. -image +![](figures/ch13/t0183-02.jpg) Given these definitions, we can now discuss the various execution models used in concurrent programming. @@ -185,11 +177,11 @@ Recommendation: Avoid using more than one method on a shared object. There will be times when you must use more than one method on a shared object. When this is the case, there are three ways to make the code correct: -• Client-Based Locking—Have the client lock the server before calling the first method and make sure the lock’s extent includes code calling the last method. +- Client-Based Locking—Have the client lock the server before calling the first method and make sure the lock’s extent includes code calling the last method. -• Server-Based Locking—Within the server create a method that locks the server, calls all the methods, and then unlocks. Have the client call the new method. +- Server-Based Locking—Within the server create a method that locks the server, calls all the methods, and then unlocks. Have the client call the new method. -• Adapted Server—create an intermediary that performs the locking. This is an example of server-based locking, where the original server cannot be changed. +- Adapted Server—create an intermediary that performs the locking. This is an example of server-based locking, where the original server cannot be changed. KEEP SYNCHRONIZED SECTIONS SMALL The synchronized keyword introduces a lock. All sections of code guarded by the same lock are guaranteed to have only one thread executing through them at any given time. Locks are expensive because they create delays and add overhead. So we don’t want to litter our code with synchronized statements. On the other hand, critical sections13 must be guarded. So we want to design our code with as few critical sections as possible. @@ -224,19 +216,13 @@ Recommendation: Write tests that have the potential to expose problems and then That is a whole lot to take into consideration. Here are a few more fine-grained recommendations: -• Treat spurious failures as candidate threading issues. - -• Get your nonthreaded code working first. - -• Make your threaded code pluggable. - -• Make your threaded code tunable. - -• Run with more threads than processors. - -• Run on different platforms. - -• Instrument your code to try and force failures. +- Treat spurious failures as candidate threading issues. +- Get your nonthreaded code working first. +- Make your threaded code pluggable. +- Make your threaded code tunable. +- Run with more threads than processors. +- Run on different platforms. +- Instrument your code to try and force failures. Treat Spurious Failures as Candidate Threading Issues Threaded code causes things to fail that “simply cannot fail.” Most developers do not have an intuitive feel for how threading interacts with other code (authors included). Bugs in threaded code might exhibit their symptoms once in a thousand, or a million, executions. Attempts to repeat the systems can be frustratingly. This often leads developers to write off the failure as a cosmic ray, a hardware glitch, or some other kind of “one-off.” It is best to assume that one-offs do not exist. The longer these “one-offs” are ignored, the more code is built on top of a potentially faulty approach. @@ -251,13 +237,10 @@ Recommendation: Do not try to chase down nonthreading bugs and threading bugs at Make Your Threaded Code Pluggable Write the concurrency-supporting code such that it can be run in several configurations: -• One thread, several threads, varied as it executes - -• Threaded code interacts with something that can be both real or a test double. - -• Execute with test doubles that run quickly, slowly, variable. - -• Configure tests so they can run for a number of iterations. +- One thread, several threads, varied as it executes +- Threaded code interacts with something that can be both real or a test double. +- Execute with test doubles that run quickly, slowly, variable. +- Configure tests so they can run for a number of iterations. Recommendation: Make your thread-based code especially pluggable so that you can run it in various configurations. @@ -287,9 +270,8 @@ Each of these methods can affect the order of execution, thereby increasing the There are two options for code instrumentation: -• Hand-coded - -• Automated +- Hand-coded +- Automated Hand-Coded You can insert calls to wait(), sleep(), yield(), and priority() in your code by hand. It might be just the thing to do when you’re testing a particularly thorny piece of code. @@ -312,13 +294,10 @@ The inserted call to yield() will change the execution pathways taken by the cod There are many problems with this approach: -• You have to manually find appropriate places to do this. - -• How do you know where to put the call and what kind of call to use? - -• Leaving such code in a production environment unnecessarily slows the code down. - -• It’s a shotgun approach. You may or may not find flaws. Indeed, the odds aren’t with you. +- You have to manually find appropriate places to do this. +- How do you know where to put the call and what kind of call to use? +- Leaving such code in a production environment unnecessarily slows the code down. +- It’s a shotgun approach. You may or may not find flaws. Indeed, the odds aren’t with you. What we need is a way to do this during testing but not in production. We also need to easily mix up configurations between different runs, which results in increased chances of finding errors in the aggregate. diff --git a/docs/ch14.md b/docs/ch14.md index 2eb05d2..87c8242 100644 --- a/docs/ch14.md +++ b/docs/ch14.md @@ -1,7 +1,7 @@ # 第 14 章 Successive Refinement Case Study of a Command-Line Argument Parser -Image +![](figures/ch14/14_1fig_martin.jpg) This chapter is a case study in successive refinement. You will see a module that started well but did not scale. Then you will see how the module was refactored and cleaned. diff --git a/docs/ch15.md b/docs/ch15.md index 38e5490..766698c 100644 --- a/docs/ch15.md +++ b/docs/ch15.md @@ -1,5 +1,5 @@ # 第 15 章 JUnit Internals -Image +![](figures/ch15/15_1fig_martin.jpg) JUnit is one of the most famous of all Java frameworks. As frameworks go, it is simple in conception, precise in definition, and elegant in implementation. But what does the code look like? In this chapter we’ll critique an example drawn from the JUnit framework. diff --git a/docs/ch16.md b/docs/ch16.md index d67c057..7ea7132 100644 --- a/docs/ch16.md +++ b/docs/ch16.md @@ -1,5 +1,5 @@ # 第 16 章 Refactoring SerialDate -Image +![](figures/ch16/16_1fig_martin.jpg) If you go to http://www.jfree.org/jcommon/index.php, you will find the JCommon library. Deep within that library there is a package named org.jfree.date. Within that package there is a class named SerialDate. We are going to explore that class. diff --git a/docs/ch17.md b/docs/ch17.md index d082f9d..9f24188 100644 --- a/docs/ch17.md +++ b/docs/ch17.md @@ -1,9 +1,6 @@ # 第 17 章 Smells and Heuristics -Image -Image - -Image +![](figures/ch17/17_1fig_martin.jpg) In his wonderful book Refactoring,1 Martin Fowler identified many different “Code Smells.” The list that follows includes many of Martin’s smells and adds many more of my own. It also includes other pearls and heuristics that I use to practice my trade. @@ -319,7 +316,7 @@ The simple use of explanatory variables makes it clear that the first matched gr It is hard to overdo this. More explanatory variables are generally better than fewer. It is remarkable how an opaque module can suddenly become transparent simply by breaking the calculations up into well-named intermediate values. G20: Function Names Should Say What They Do -Image + Look at this code: ```java @@ -416,7 +413,7 @@ Everyone on the team should follow these conventions. This means that each team If you would like to know what conventions I follow, you’ll see them in the refactored code in Listing B-7 on page 394, through Listing B-14. G25: Replace Magic Numbers with Named Constants -Image + This is probably one of the oldest rules in software development. I remember reading it in the late sixties in introductory COBOL, FORTRAN, and PL/1 manuals. In general it is a bad idea to have raw numbers in your code. You should hide them behind well-named constants. @@ -454,7 +451,7 @@ When you make a decision in your code, make sure you make it precisely. Know why Ambiguities and imprecision in code are either a result of disagreements or laziness. In either case they should be eliminated. G27: Structure over Convention -Image + Enforce design decisions with structure over convention. Naming conventions are good, but they are inferior to structures that force compliance. For example, switch/cases with nicely named enumerations are inferior to base classes with abstract methods. No one is forced to implement the switch/case statement the same way each time; but the base classes do enforce that concrete classes have all abstract methods implemented. diff --git a/docs/ch2.md b/docs/ch2.md index f946ede..05f366b 100644 --- a/docs/ch2.md +++ b/docs/ch2.md @@ -1,17 +1,19 @@ # 第 2 章 Meaningful Names -by Tim Ottinger -Image -INTRODUCTION +![](figures/ch2/2_1fig_martin.jpg) + +by Tim Ottinger + +## INTRODUCTION Names are everywhere in software. We name our variables, our functions, our arguments, classes, and packages. We name our source files and the directories that contain them. We name our jar files and war files and ear files. We name and name and name. Because we do so much of it, we’d better do it well. What follows are some simple rules for creating good names. -USE INTENTION-REVEALING NAMES +## USE INTENTION-REVEALING NAMES It is easy to say that names should reveal intent. What we want to impress upon you is that we are serious about this. Choosing good names takes time but saves more than it takes. So take care with your names and change them when you find better ones. Everyone who reads your code (including you) will be happier if you do. The name of a variable, function, or class, should answer all the big questions. It should tell you why it exists, what it does, and how it is used. If a name requires a comment, then the name does not reveal its intent. ```java int d; // elapsed time in days - +``` The name d reveals nothing. It does not evoke a sense of elapsed time, nor of days. We should choose a name that specifies what is being measured and the unit of that measurement: ```java int elapsedTimeInDays; @@ -34,11 +36,8 @@ Why is it hard to tell what this code is doing? There are no complex expressions The problem isn’t the simplicity of the code but the implicity of the code (to coin a phrase): the degree to which the context is not explicit in the code itself. The code implicitly requires that we know the answers to questions such as: 1. What kinds of things are in theList? - 2. What is the significance of the zeroth subscript of an item in theList? - 3. What is the significance of the value 4? - 4. How would I use the list being returned? The answers to these questions are not present in the code sample, but they could have been. Say that we’re working in a mine sweeper game. We find that the board is a list of cells called theList. Let’s rename that to gameBoard. @@ -67,7 +66,7 @@ We can go further and write a simple class for cells instead of using an array o ``` With these simple name changes, it’s not difficult to understand what’s going on. This is the power of choosing good names. -AVOID DISINFORMATION +## AVOID DISINFORMATION Programmers must avoid leaving false clues that obscure the meaning of code. We should avoid words whose entrenched meanings vary from our intended meaning. For example, hp, aix, and sco would be poor variable names because they are the names of Unix platforms or variants. Even if you are coding a hypotenuse and hp looks like a good abbreviation, it could be disinformative. Do not refer to a grouping of accounts as an accountList unless it’s actually a List. The word list means something specific to programmers. If the container holding the accounts is not actually a List, it may lead to false conclusions.1 So accountGroup or bunchOfAccounts or just plain accounts would be better. @@ -88,8 +87,8 @@ A truly awful example of disinformative names would be the use of lower-case L o ``` The reader may think this a contrivance, but we have examined code where such things were abundant. In one case the author of the code suggested using a different font so that the differences were more obvious, a solution that would have to be passed down to all future developers as oral tradition or in a written document. The problem is conquered with finality and without creating new work products by a simple renaming. -MAKE MEANINGFUL DISTINCTIONS -Image +## MAKE MEANINGFUL DISTINCTIONS +![](figures/ch2/2_2fig_martin.jpg) Programmers create problems for themselves when they write code solely to satisfy a compiler or interpreter. For example, because you can’t use the same name to refer to two different things in the same scope, you might be tempted to change one name in an arbitrary way. Sometimes this is done by misspelling one, leading to the surprising situation where correcting spelling errors leads to an inability to compile.2 @@ -125,7 +124,7 @@ How are the programmers in this project supposed to know which of these function In the absence of specific conventions, the variable moneyAmount is indistinguishable from money, customerInfo is indistinguishable from customer, accountData is indistinguishable from account, and theMessage is indistinguishable from message. Distinguish names in such a way that the reader knows what the differences offer. -USE PRONOUNCEABLE NAMES +## USE PRONOUNCEABLE NAMES Humans are good at words. A significant part of our brains is dedicated to the concept of words. And words are, by definition, pronounceable. It would be a shame not to take advantage of that huge portion of our brains that has evolved to deal with spoken language. So make your names pronounceable. If you can’t pronounce it, you can’t discuss it without sounding like an idiot. “Well, over here on the bee cee arr three cee enn tee we have a pee ess zee kyew int, see?” This matters because programming is a social activity. @@ -150,7 +149,7 @@ to ``` Intelligent conversation is now possible: “Hey, Mikey, take a look at this record! The generation timestamp is set to tomorrow’s date! How can that be?” -USE SEARCHABLE NAMES +## USE SEARCHABLE NAMES Single-letter names and numeric constants have a particular problem in that they are not easy to locate across a body of text. One might easily grep for MAX_CLASSES_PER_STUDENT, but the number 7 could be more troublesome. Searches may turn up the digit as part of file names, other constant definitions, and in various expressions where the value is used with different intent. It is even worse when a constant is a long number and someone might have transposed digits, thereby creating a bug while simultaneously evading the programmer’s search. @@ -176,10 +175,10 @@ to ``` Note that sum, above, is not a particularly useful name but at least is searchable. The intentionally named code makes for a longer function, but consider how much easier it will be to find WORK_DAYS_PER_WEEK than to find all the places where 5 was used and filter the list down to just the instances with the intended meaning. -AVOID ENCODINGS +## AVOID ENCODINGS We have enough encodings to deal with without adding more to our burden. Encoding type or scope information into names simply adds an extra burden of deciphering. It hardly seems reasonable to require each new employee to learn yet another encoding “language” in addition to learning the (usually considerable) body of code that they’ll be working in. It is an unnecessary mental burden when trying to solve a problem. Encoded names are seldom pronounceable and are easy to mis-type. -Hungarian Notation +## Hungarian Notation In days of old, when we worked in name-length-challenged languages, we violated this rule out of necessity, and with regret. Fortran forced encodings by making the first letter a code for the type. Early versions of BASIC allowed only a letter plus one digit. Hungarian Notation (HN) took this to a whole new level. HN was considered to be pretty important back in the Windows C API, when everything was an integer handle or a long pointer or a void pointer, or one of several implementations of “string” (with different uses and attributes). The compiler did not check types in those days, so the programmers needed a crutch to help them remember the types. @@ -191,7 +190,7 @@ Java programmers don’t need type encoding. Objects are strongly typed, and edi PhoneNumber phoneString; // name not changed when type changed! ``` -Member Prefixes +## Member Prefixes You also don’t need to prefix member variables with m_ anymore. Your classes and functions should be small enough that you don’t need them. And you should be using an editing environment that highlights or colorizes members to make them distinct. ```java public class Part { @@ -210,10 +209,10 @@ You also don’t need to prefix member variables with m_ anymore. Your classes a ``` Besides, people quickly learn to ignore the prefix (or suffix) to see the meaningful part of the name. The more we read the code, the less we see the prefixes. Eventually the prefixes become unseen clutter and a marker of older code. -Interfaces and Implementations +## Interfaces and Implementations These are sometimes a special case for encodings. For example, say you are building an ABSTRACT FACTORY for the creation of shapes. This factory will be an interface and will be implemented by a concrete class. What should you name them? IShapeFactory and ShapeFactory? I prefer to leave interfaces unadorned. The preceding I, so common in today’s legacy wads, is a distraction at best and too much information at worst. I don’t want my users knowing that I’m handing them an interface. I just want them to know that it’s a ShapeFactory. So if I must encode either the interface or the implementation, I choose the implementation. Calling it ShapeFactoryImp, or even the hideous CShapeFactory, is preferable to encoding the interface. -AVOID MENTAL MAPPING +## AVOID MENTAL MAPPING Readers shouldn’t have to mentally translate your names into other names they already know. This problem generally arises from a choice to use neither problem domain terms nor solution domain terms. This is a problem with single-letter variable names. Certainly a loop counter may be named i or j or k (though never l!) if its scope is very small and no other names can conflict with it. This is because those single-letter names for loop counters are traditional. However, in most other contexts a single-letter name is a poor choice; it’s just a place holder that the reader must mentally map to the actual concept. There can be no worse reason for using the name c than because a and b were already taken. @@ -222,10 +221,10 @@ In general programmers are pretty smart people. Smart people sometimes like to s One difference between a smart programmer and a professional programmer is that the professional understands that clarity is king. Professionals use their powers for good and write code that others can understand. -CLASS NAMES +## CLASS NAMES Classes and objects should have noun or noun phrase names like Customer, WikiPage, Account, and AddressParser. Avoid words like Manager, Processor, Data, or Info in the name of a class. A class name should not be a verb. -METHOD NAMES +## METHOD NAMES Methods should have verb or verb phrase names like postPayment, deletePage, or save. Accessors, mutators, and predicates should be named for their value and prefixed with get, set, and is according to the javabean standard.4 4. http://java.sun.com/products/javabeans/docs/spec.html @@ -244,16 +243,16 @@ is generally better than ``` Consider enforcing their use by making the corresponding constructors private. -DON’T BE CUTE +## DON’T BE CUTE If names are too clever, they will be memorable only to people who share the author’s sense of humor, and only as long as these people remember the joke. Will they know what the function named HolyHandGrenade is supposed to do? Sure, it’s cute, but maybe in this case DeleteItems might be a better name. Choose clarity over entertainment value. -Image +![](figures/ch2/2_3fig_martin.jpg) Cuteness in code often appears in the form of colloquialisms or slang. For example, don’t use the name whack() to mean kill(). Don’t tell little culture-dependent jokes like eatMyShorts() to mean abort(). Say what you mean. Mean what you say. -PICK ONE WORD PER CONCEPT +## PICK ONE WORD PER CONCEPT Pick one word for one abstract concept and stick with it. For instance, it’s confusing to have fetch, retrieve, and get as equivalent methods of different classes. How do you remember which method name goes with which class? Sadly, you often have to remember which company, group, or individual wrote the library or class in order to remember which term was used. Otherwise, you spend an awful lot of time browsing through headers and previous code samples. Modern editing environments like Eclipse and IntelliJ provide context-sensitive clues, such as the list of methods you can call on a given object. But note that the list doesn’t usually give you the comments you wrote around your function names and parameter lists. You are lucky if it gives the parameter names from function declarations. The function names have to stand alone, and they have to be consistent in order for you to pick the correct method without any additional exploration. @@ -262,7 +261,7 @@ Likewise, it’s confusing to have a controller and a manager and a driver in th A consistent lexicon is a great boon to the programmers who must use your code. -DON’T PUN +## DON’T PUN Avoid using the same word for two purposes. Using the same term for two different ideas is essentially a pun. If you follow the “one word per concept” rule, you could end up with many classes that have, for example, an add method. As long as the parameter lists and return values of the various add methods are semantically equivalent, all is well. @@ -271,17 +270,17 @@ However one might decide to use the word add for “consistency” when he or sh Our goal, as authors, is to make our code as easy as possible to understand. We want our code to be a quick skim, not an intense study. We want to use the popular paperback model whereby the author is responsible for making himself clear and not the academic model where it is the scholar’s job to dig the meaning out of the paper. -USE SOLUTION DOMAIN NAMES +## USE SOLUTION DOMAIN NAMES Remember that the people who read your code will be programmers. So go ahead and use computer science (CS) terms, algorithm names, pattern names, math terms, and so forth. It is not wise to draw every name from the problem domain because we don’t want our coworkers to have to run back and forth to the customer asking what every name means when they already know the concept by a different name. The name AccountVisitor means a great deal to a programmer who is familiar with the VISITOR pattern. What programmer would not know what a JobQueue was? There are lots of very technical things that programmers have to do. Choosing technical names for those things is usually the most appropriate course. -USE PROBLEM DOMAIN NAMES +## USE PROBLEM DOMAIN NAMES When there is no “programmer-eese” for what you’re doing, use the name from the problem domain. At least the programmer who maintains your code can ask a domain expert what it means. Separating solution and problem domain concepts is part of the job of a good programmer and designer. The code that has more to do with problem domain concepts should have names drawn from the problem domain. -ADD MEANINGFUL CONTEXT +## ADD MEANINGFUL CONTEXT There are a few names which are meaningful in and of themselves—most are not. Instead, you need to place names in context for your reader by enclosing them in well-named classes, functions, or namespaces. When all else fails, then prefixing the name may be necessary as a last resort. Imagine that you have variables named firstName, lastName, street, houseNumber, city, state, and zipcode. Taken together it’s pretty clear that they form an address. But what if you just saw the state variable being used alone in a method? Would you automatically infer that it was part of an address? @@ -361,7 +360,7 @@ Listing 2-2 Variables have a context. } } ``` -DON’T ADD GRATUITOUS CONTEXT +## DON’T ADD GRATUITOUS CONTEXT In an imaginary application called “Gas Station Deluxe,” it is a bad idea to prefix every class with GSD. Frankly, you are working against your tools. You type G and press the completion key and are rewarded with a mile-long list of every class in the system. Is that wise? Why make it hard for the IDE to help you? Likewise, say you invented a MailingAddress class in GSD’s accounting module, and you named it GSDAccountAddress. Later, you need a mailing address for your customer contact application. Do you use GSDAccountAddress? Does it sound like the right name? Ten of 17 characters are redundant or irrelevant. @@ -370,7 +369,7 @@ Shorter names are generally better than longer ones, so long as they are clear. The names accountAddress and customerAddress are fine names for instances of the class Address but could be poor names for classes. Address is a fine name for a class. If I need to differentiate between MAC addresses, port addresses, and Web addresses, I might consider PostalAddress, MAC, and URI. The resulting names are more precise, which is the point of all naming. -FINAL WORDS +## FINAL WORDS The hardest thing about choosing good names is that it requires good descriptive skills and a shared cultural background. This is a teaching issue rather than a technical, business, or management issue. As a result many people in this field don’t learn to do it very well. People are also afraid of renaming things for fear that some other developers will object. We do not share that fear and find that we are actually grateful when names change (for the better). Most of the time we don’t really memorize the names of classes and methods. We use the modern tools to deal with details like that so we can focus on whether the code reads like paragraphs and sentences, or at least like tables and data structure (a sentence isn’t always the best way to display data). You will probably end up surprising someone when you rename, just like you might with any other code improvement. Don’t let it stop you in your tracks. diff --git a/docs/ch3.md b/docs/ch3.md index 1d2b33b..7109a7a 100644 --- a/docs/ch3.md +++ b/docs/ch3.md @@ -1,7 +1,6 @@ # 第 3 章 Functions -Image -Image +![](figures/ch3/3_1fig_martin.jpg) In the early days of programming we composed our systems of routines and subroutines. Then, in the era of Fortran and PL/1 we composed our systems of programs, subprograms, and functions. Nowadays only the function survives from those early days. Functions are the first line of organization in any program. Writing them well is the topic of this chapter. @@ -106,7 +105,7 @@ Unless you are a student of FitNesse, you probably don’t understand all the de So what is it that makes a function like Listing 3-2 easy to read and understand? How can we make a function communicate its intent? What attributes can we give our functions that will allow a casual reader to intuit the kind of program they live inside? -SMALL! +## SMALL! The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that. This is not an assertion that I can justify. I can’t provide any references to research that shows that very small functions are better. What I can tell you is that for nearly four decades I have written functions of all different sizes. I’ve written several nasty 3,000-line abominations. I’ve written scads of functions in the 100 to 300 line range. And I’ve written functions that were 20 to 30 lines long. What this experience has taught me, through long trial and error, is that functions should be very small. In the eighties we used to say that a function should be no bigger than a screen-full. Of course we said that at a time when VT100 screens were 24 lines by 80 columns, and our editors used 4 lines for administrative purposes. Nowadays with a cranked-down font and a nice big monitor, you can fit 150 characters on a line and a 100 lines or more on a screen. Lines should not be 150 characters long. Functions should not be 100 lines long. Functions should hardly ever be 20 lines long. @@ -128,26 +127,24 @@ Listing 3-3 HtmlUtil.java (re-refactored) return pageData.getHtml(); } ``` -Blocks and Indenting +## Blocks and Indenting This implies that the blocks within if statements, else statements, while statements, and so on should be one line long. Probably that line should be a function call. Not only does this keep the enclosing function small, but it also adds documentary value because the function called within the block can have a nicely descriptive name. This also implies that functions should not be large enough to hold nested structures. Therefore, the indent level of a function should not be greater than one or two. This, of course, makes the functions easier to read and understand. -DO ONE THING +## DO ONE THING It should be very clear that Listing 3-1 is doing lots more than one thing. It’s creating buffers, fetching pages, searching for inherited pages, rendering paths, appending arcane strings, and generating HTML, among other things. Listing 3-1 is very busy doing lots of different things. On the other hand, Listing 3-3 is doing one simple thing. It’s including setups and teardowns into test pages. The following advice has appeared in one form or another for 30 years or more. -Image +![](figures/ch3/3_2fig_martin.jpg) FUNCTIONS SHOULD DO ONE THING. THEY SHOULD DO IT WELL. THEY SHOULD DO IT ONLY. The problem with this statement is that it is hard to know what “one thing” is. Does Listing 3-3 do one thing? It’s easy to make the case that it’s doing three things: 1. Determining whether the page is a test page. - 2. If so, including setups and teardowns. - 3. Rendering the page in HTML. So which is it? Is the function doing one thing or three things? Notice that the three steps of the function are one level of abstraction below the stated name of the function. We can describe the function by describing it as a brief TO4 paragraph: @@ -162,15 +159,15 @@ It should be very clear that Listing 3-1 contains steps at many different levels So, another way to know that a function is doing more than “one thing” is if you can extract another function from it with a name that is not merely a restatement of its implementation [G34]. -Sections within Functions +## Sections within Functions Look at Listing 4-7 on page 71. Notice that the generatePrimes function is divided into sections such as declarations, initializations, and sieve. This is an obvious symptom of doing more than one thing. Functions that do one thing cannot be reasonably divided into sections. -ONE LEVEL OF ABSTRACTION PER FUNCTION +## ONE LEVEL OF ABSTRACTION PER FUNCTION In order to make sure our functions are doing “one thing,” we need to make sure that the statements within our function are all at the same level of abstraction. It is easy to see how Listing 3-1 violates this rule. There are concepts in there that are at a very high level of abstraction, such as getHtml(); others that are at an intermediate level of abstraction, such as: String pagePathName = PathParser.render(pagePath); and still others that are remarkably low level, such as: .append(”\n”). Mixing levels of abstraction within a function is always confusing. Readers may not be able to tell whether a particular expression is an essential concept or a detail. Worse, like broken windows, once details are mixed with essential concepts, more and more details tend to accrete within the function. -Reading Code from Top to Bottom: The Stepdown Rule +## Reading Code from Top to Bottom: The Stepdown Rule We want the code to read like a top-down narrative.5 We want every function to be followed by those at the next level of abstraction so that we can read the program, descending one level of abstraction at a time as we read down the list of functions. I call this The Step-down Rule. 5. [KP78], p. 37. @@ -189,7 +186,7 @@ It turns out to be very difficult for programmers to learn to follow this rule a Take a look at Listing 3-7 at the end of this chapter. It shows the whole testableHtml function refactored according to the principles described here. Notice how each function introduces the next, and each function remains at a consistent level of abstraction. -SWITCH STATEMENTS +## SWITCH STATEMENTS It’s hard to make a small switch statement.6 Even a switch statement with only two cases is larger than I’d like a single block or function to be. It’s also hard to make a switch statement that does one thing. By their nature, switch statements always do N things. Unfortunately we can’t always avoid switch statements, but we can make sure that each switch statement is buried in a low-level class and is never repeated. We do this, of course, with polymorphism. 6. And, of course, I include if/else chains in this. @@ -266,7 +263,7 @@ Listing 3-5 Employee and Factory } } ``` -USE DESCRIPTIVE NAMES +## USE DESCRIPTIVE NAMES In Listing 3-7 I changed the name of our example function from testableHtml to SetupTeardownIncluder.render. This is a far better name because it better describes what the function does. I also gave each of the private methods an equally descriptive name such as isTestable or includeSetupAndTeardownPages. It is hard to overestimate the value of good names. Remember Ward’s principle: “You know you are working on clean code when each routine turns out to be pretty much what you expected.” Half the battle to achieving that principle is choosing good names for small functions that do one thing. The smaller and more focused a function is, the easier it is to choose a descriptive name. Don’t be afraid to make a name long. A long descriptive name is better than a short enigmatic name. A long descriptive name is better than a long descriptive comment. Use a naming convention that allows multiple words to be easily read in the function names, and then make use of those multiple words to give the function a name that says what it does. @@ -277,10 +274,10 @@ Choosing descriptive names will clarify the design of the module in your mind an Be consistent in your names. Use the same phrases, nouns, and verbs in the function names you choose for your modules. Consider, for example, the names includeSetup-AndTeardownPages, includeSetupPages, includeSuiteSetupPage, and includeSetupPage. The similar phraseology in those names allows the sequence to tell a story. Indeed, if I showed you just the sequence above, you’d ask yourself: “What happened to includeTeardownPages, includeSuiteTeardownPage, and includeTeardownPage?” How’s that for being “… pretty much what you expected.” -FUNCTION ARGUMENTS +## FUNCTION ARGUMENTS The ideal number of arguments for a function is zero (niladic). Next comes one (monadic), followed closely by two (dyadic). Three arguments (triadic) should be avoided where possible. More than three (polyadic) requires very special justification—and then shouldn’t be used anyway. -Image +![](figures/ch3/3_3fig_martin.jpg) Arguments are hard. They take a lot of conceptual power. That’s why I got rid of almost all of them from the example. Consider, for instance, the StringBuffer in the example. We could have passed it around as an argument rather than making it an instance variable, but then our readers would have had to interpret it each time they saw it. When you are reading the story told by the module, includeSetupPage() is easier to understand than includeSetupPageInto(newPage-Content). The argument is at a different level of abstraction than the function name and forces you to know a detail (in other words, StringBuffer) that isn’t particularly important at that point. @@ -290,19 +287,19 @@ Output arguments are harder to understand than input arguments. When we read a f One input argument is the next best thing to no arguments. SetupTeardown-Includer.render(pageData) is pretty easy to understand. Clearly we are going to render the data in the pageData object. -Common Monadic Forms +## Common Monadic Forms There are two very common reasons to pass a single argument into a function. You may be asking a question about that argument, as in boolean fileExists(“MyFile”). Or you may be operating on that argument, transforming it into something else and returning it. For example, InputStream fileOpen(“MyFile”) transforms a file name String into an InputStream return value. These two uses are what readers expect when they see a function. You should choose names that make the distinction clear, and always use the two forms in a consistent context. (See Command Query Separation below.) A somewhat less common, but still very useful form for a single argument function, is an event. In this form there is an input argument but no output argument. The overall program is meant to interpret the function call as an event and use the argument to alter the state of the system, for example, void passwordAttemptFailedNtimes(int attempts). Use this form with care. It should be very clear to the reader that this is an event. Choose names and contexts carefully. Try to avoid any monadic functions that don’t follow these forms, for example, void includeSetupPageInto(StringBuffer pageText). Using an output argument instead of a return value for a transformation is confusing. If a function is going to transform its input argument, the transformation should appear as the return value. Indeed, StringBuffer transform(StringBuffer in) is better than void transform(StringBuffer out), even if the implementation in the first case simply returns the input argument. At least it still follows the form of a transformation. -Flag Arguments +## Flag Arguments Flag arguments are ugly. Passing a boolean into a function is a truly terrible practice. It immediately complicates the signature of the method, loudly proclaiming that this function does more than one thing. It does one thing if the flag is true and another if the flag is false! In Listing 3-7 we had no choice because the callers were already passing that flag in, and I wanted to limit the scope of refactoring to the function and below. Still, the method call render(true) is just plain confusing to a poor reader. Mousing over the call and seeing render(boolean isSuite) helps a little, but not that much. We should have split the function into two: renderForSuite() and renderForSingleTest(). -Dyadic Functions +## Dyadic Functions A function with two arguments is harder to understand than a monadic function. For example, writeField(name) is easier to understand than writeField(output-Stream, name).10 Though the meaning of both is clear, the first glides past the eye, easily depositing its meaning. The second requires a short pause until we learn to ignore the first parameter. And that, of course, eventually results in problems because we should never ignore any part of code. The parts we ignore are where the bugs will hide. 10. I just finished refactoring a module that used the dyadic form. I was able to make the outputStream a field of the class and convert all the writeField calls to the monadic form. The result was much cleaner. @@ -313,14 +310,14 @@ Even obvious dyadic functions like assertEquals(expected, actual) are problemati Dyads aren’t evil, and you will certainly have to write them. However, you should be aware that they come at a cost and should take advantage of what mechanisms may be available to you to convert them into monads. For example, you might make the writeField method a member of outputStream so that you can say outputStream. writeField(name). Or you might make the outputStream a member variable of the current class so that you don’t have to pass it. Or you might extract a new class like FieldWriter that takes the outputStream in its constructor and has a write method. -Triads +## Triads Functions that take three arguments are significantly harder to understand than dyads. The issues of ordering, pausing, and ignoring are more than doubled. I suggest you think very carefully before creating a triad. For example, consider the common overload of assertEquals that takes three arguments: assertEquals(message, expected, actual). How many times have you read the message and thought it was the expected? I have stumbled and paused over that particular triad many times. In fact, every time I see it, I do a double-take and then learn to ignore the message. On the other hand, here is a triad that is not quite so insidious: assertEquals(1.0, amount, .001). Although this still requires a double-take, it’s one that’s worth taking. It’s always good to be reminded that equality of floating point values is a relative thing. -Argument Objects +## Argument Objects When a function seems to need more than two or three arguments, it is likely that some of those arguments ought to be wrapped into a class of their own. Consider, for example, the difference between the two following declarations: ```java Circle makeCircle(double x, double y, double radius); @@ -328,7 +325,7 @@ When a function seems to need more than two or three arguments, it is likely tha ``` Reducing the number of arguments by creating objects out of them may seem like cheating, but it’s not. When groups of variables are passed together, the way x and y are in the example above, they are likely part of a concept that deserves a name of its own. -Argument Lists +## Argument Lists Sometimes we want to pass a variable number of arguments into a function. Consider, for example, the String.format method: ```java String.format(”%s worked %.2f hours.”, name, hours); @@ -343,12 +340,12 @@ So all the same rules apply. Functions that take variable arguments can be monad void dyad(String name, Integer… args); void triad(String name, int count, Integer… args); ``` -Verbs and Keywords +## Verbs and Keywords Choosing good names for a function can go a long way toward explaining the intent of the function and the order and intent of the arguments. In the case of a monad, the function and argument should form a very nice verb/noun pair. For example, write(name) is very evocative. Whatever this “name” thing is, it is being “written.” An even better name might be writeField(name), which tells us that the “name” thing is a “field.” This last is an example of the keyword form of a function name. Using this form we encode the names of the arguments into the function name. For example, assertEquals might be better written as assertExpectedEqualsActual(expected, actual). This strongly mitigates the problem of having to remember the ordering of the arguments. -HAVE NO SIDE EFFECTS +## HAVE NO SIDE EFFECTS Side effects are lies. Your function promises to do one thing, but it also does other hidden things. Sometimes it will make unexpected changes to the variables of its own class. Sometimes it will make them to the parameters passed into the function or to system globals. In either case they are devious and damaging mistruths that often result in strange temporal couplings and order dependencies. Consider, for example, the seemingly innocuous function in Listing 3-6. This function uses a standard algorithm to match a userName to a password. It returns true if they match and false if anything goes wrong. But it also has a side effect. Can you spot it? @@ -378,7 +375,7 @@ The side effect is the call to Session.initialize(), of course. The checkPasswor This side effect creates a temporal coupling. That is, checkPassword can only be called at certain times (in other words, when it is safe to initialize the session). If it is called out of order, session data may be inadvertently lost. Temporal couplings are confusing, especially when hidden as a side effect. If you must have a temporal coupling, you should make it clear in the name of the function. In this case we might rename the function checkPasswordAndInitializeSession, though that certainly violates “Do one thing.” -Output Arguments +## Output Arguments Arguments are most naturally interpreted as inputs to a function. If you have been programming for more than a few years, I’m sure you’ve done a double-take on an argument that was actually an output rather than an input. For example: ```java appendFooter(s); @@ -395,7 +392,7 @@ In the days before object oriented programming it was sometimes necessary to hav ``` In general output arguments should be avoided. If your function must change the state of something, have it change the state of its owning object. -COMMAND QUERY SEPARATION +## COMMAND QUERY SEPARATION Functions should either do something or answer something, but not both. Either your function should change the state of an object, or it should return some information about that object. Doing both often leads to confusion. Consider, for example, the following function: ```java public boolean set(String attribute, String value); @@ -413,7 +410,7 @@ The author intended set to be a verb, but in the context of the if statement it … } ``` -PREFER EXCEPTIONS TO RETURNING ERROR CODES +## PREFER EXCEPTIONS TO RETURNING ERROR CODES Returning error codes from command functions is a subtle violation of command query separation. It promotes commands being used as expressions in the predicates of if statements. ```java if (deletePage(page) == E_OK) @@ -446,7 +443,7 @@ On the other hand, if you use exceptions instead of returned error codes, then t logger.log(e.getMessage()); } ``` -Extract Try/Catch Blocks +## Extract Try/Catch Blocks Try/catch blocks are ugly in their own right. They confuse the structure of the code and mix error processing with normal processing. So it is better to extract the bodies of the try and catch blocks out into functions of their own. ```java public void delete(Page page) { @@ -470,10 +467,10 @@ Try/catch blocks are ugly in their own right. They confuse the structure of the ``` In the above, the delete function is all about error processing. It is easy to understand and then ignore. The deletePageAndAllReferences function is all about the processes of fully deleting a page. Error handling can be ignored. This provides a nice separation that makes the code easier to understand and modify. -Error Handling Is One Thing +## Error Handling Is One Thing Functions should do one thing. Error handing is one thing. Thus, a function that handles errors should do nothing else. This implies (as in the example above) that if the keyword try exists in a function, it should be the very first word in the function and that there should be nothing after the catch/finally blocks. -The Error.java Dependency Magnet +## The Error.java Dependency Magnet Returning error codes usually implies that there is some class or enum in which all the error codes are defined. ```java public enum Error { @@ -494,18 +491,18 @@ When you use exceptions rather than error codes, then new exceptions are derivat 12. This is an example of the Open Closed Principle (OCP) [PPP02]. -DON’T REPEAT YOURSELF13 +## DON’T REPEAT YOURSELF13 13. The DRY principle. [PRAG]. Look back at Listing 3-1 carefully and you will notice that there is an algorithm that gets repeated four times, once for each of the SetUp, SuiteSetUp, TearDown, and SuiteTearDown cases. It’s not easy to spot this duplication because the four instances are intermixed with other code and aren’t uniformly duplicated. Still, the duplication is a problem because it bloats the code and will require four-fold modification should the algorithm ever have to change. It is also a four-fold opportunity for an error of omission. -Image +![](figures/ch3/3_4fig_martin.jpg) This duplication was remedied by the include method in Listing 3-7. Read through that code again and notice how the readability of the whole module is enhanced by the reduction of that duplication. Duplication may be the root of all evil in software. Many principles and practices have been created for the purpose of controlling or eliminating it. Consider, for example, that all of Codd’s database normal forms serve to eliminate duplication in data. Consider also how object-oriented programming serves to concentrate code into base classes that would otherwise be redundant. Structured programming, Aspect Oriented Programming, Component Oriented Programming, are all, in part, strategies for eliminating duplication. It would appear that since the invention of the subroutine, innovations in software development have been an ongoing attempt to eliminate duplication from our source code. -STRUCTURED PROGRAMMING +## STRUCTURED PROGRAMMING Some programmers follow Edsger Dijkstra’s rules of structured programming.14 Dijkstra said that every function, and every block within a function, should have one entry and one exit. Following these rules means that there should only be one return statement in a function, no break or continue statements in a loop, and never, ever, any goto statements. 14. [SP72]. @@ -514,7 +511,7 @@ While we are sympathetic to the goals and disciplines of structured programming, So if you keep your functions small, then the occasional multiple return, break, or continue statement does no harm and can sometimes even be more expressive than the single-entry, single-exit rule. On the other hand, goto only makes sense in large functions, so it should be avoided. -HOW DO YOU WRITE FUNCTIONS LIKE THIS? +## HOW DO YOU WRITE FUNCTIONS LIKE THIS? Writing software is like any other kind of writing. When you write a paper or an article, you get your thoughts down first, then you massage it until it reads well. The first draft might be clumsy and disorganized, so you wordsmith it and restructure it and refine it until it reads the way you want it to read. When I write functions, they come out long and complicated. They have lots of indenting and nested loops. They have long argument lists. The names are arbitrary, and there is duplicated code. But I also have a suite of unit tests that cover every one of those clumsy lines of code. @@ -523,14 +520,14 @@ So then I massage and refine that code, splitting out functions, changing names, In the end, I wind up with functions that follow the rules I’ve laid down in this chapter. I don’t write them that way to start. I don’t think anyone could. -CONCLUSION +## CONCLUSION Every system is built from a domain-specific language designed by the programmers to describe that system. Functions are the verbs of that language, and classes are the nouns. This is not some throwback to the hideous old notion that the nouns and verbs in a requirements document are the first guess of the classes and functions of a system. Rather, this is a much older truth. The art of programming is, and has always been, the art of language design. Master programmers think of systems as stories to be told rather than programs to be written. They use the facilities of their chosen programming language to construct a much richer and more expressive language that can be used to tell that story. Part of that domain-specific language is the hierarchy of functions that describe all the actions that take place within that system. In an artful act of recursion those actions are written to use the very domain-specific language they define to tell their own small part of the story. This chapter has been about the mechanics of writing functions well. If you follow the rules herein, your functions will be short, well named, and nicely organized. But never forget that your real goal is to tell the story of the system, and that the functions you write need to fit cleanly together into a clear and precise language to help you with that telling. -SETUPTEARDOWNINCLUDER +## SETUPTEARDOWNINCLUDER Listing 3-7 SetupTeardownIncluder.java ```java diff --git a/docs/ch4.md b/docs/ch4.md index 72f3796..c74fc75 100644 --- a/docs/ch4.md +++ b/docs/ch4.md @@ -1,7 +1,6 @@ # 第 4 章 Comments -Image -Image +![](figures/ch4/4_1fig_martin.jpg) “Don’t comment bad code—rewrite it.”—Brian W. Kernighan and P. J. Plaugher1 1. [KP78], p. 144. @@ -148,7 +147,8 @@ There is a substantial risk, of course, that a clarifying comment is incorrect. Warning of Consequences Sometimes it is useful to warn other programmers about certain consequences. For example, here is a comment that explains why a particular test case is turned off: -Image +![](figures/ch4/4_2fig_martin.jpg) + ```java // Don't run unless you // have some time to kill. @@ -166,6 +166,7 @@ Image Nowadays, of course, we’d turn off the test case by using the @Ignore attribute with an appropriate explanatory string. @Ignore(”Takes too long to run”). But back in the days before JUnit 4, putting an underscore in front of the method name was a common convention. The comment, while flippant, makes the point pretty well. Here’s another, more poignant example: + ```java public static SimpleDateFormat makeStandardHttpDateFormat() diff --git a/docs/ch5.md b/docs/ch5.md index 0228ec8..1357cb6 100644 --- a/docs/ch5.md +++ b/docs/ch5.md @@ -1,7 +1,6 @@ # 第 5 章 Formatting -Image -Image +![](figures/ch5/5_1fig_martin.jpg) When people look under the hood, we want them to be impressed with the neatness, consistency, and attention to detail that they perceive. We want them to be struck by the orderliness. We want their eyebrows to rise as they scroll through the modules. We want them to perceive that professionals have been at work. If instead they see a scrambled mass of code that looks like it was written by a bevy of drunken sailors, then they are likely to conclude that the same inattention to detail pervades every other aspect of the project. @@ -26,7 +25,7 @@ Seven different projects are depicted. Junit, FitNesse, testNG, Time and Money, Figure 5-1 File length distributions LOG scale (box height = sigma) -Image +![](figures/ch5/5_2fig_martin.jpg) Junit, FitNesse, and Time and Money are composed of relatively small files. None are over 500 lines and most of those files are less than 200 lines. Tomcat and Ant, on the other hand, have some files that are several thousand lines long and close to half are over 200 lines. @@ -292,7 +291,7 @@ Conceptual Affinity. Certain bits of code want to be near other bits. They have As we have seen, this affinity might be based on a direct dependence, such as one function calling another, or a function using a variable. But there are other possible causes of affinity. Affinity might be caused because a group of functions perform a similar operation. Consider this snippet of code from Junit 4.3.1: -Image +![](figures/ch5/5_3fig_martin.jpg) ```java public class Assert { static public void assertTrue(String message, boolean condition) { @@ -328,7 +327,7 @@ How wide should a line be? To answer that, let’s look at how wide lines are in Figure 5-2 Java line width distribution -Image +![](figures/ch5/5_4fig_martin.jpg) This suggests that we should strive to keep our lines short. The old Hollerith limit of 80 is a bit arbitrary, and I’m not opposed to lines edging out to 100 or even 120. But beyond that is probably just careless. @@ -506,7 +505,7 @@ The title of this section is a play on words. Every programmer has his own favor A team of developers should agree upon a single formatting style, and then every member of that team should use that style. We want the software to have a consistent style. We don’t want it to appear to have been written by a bunch of disagreeing individuals. -Image +![](figures/ch5/5_5fig_martin.jpg) When I started the FitNesse project back in 2002, I sat down with the team to work out a coding style. This took about 10 minutes. We decided where we’d put our braces, what our indent size would be, how we would name classes, variables, and methods, and so forth. Then we encoded those rules into the code formatter of our IDE and have stuck with them ever since. These were not the rules that I prefer; they were rules decided by the team. As a member of that team I followed them when writing code in the FitNesse project. diff --git a/docs/ch6.md b/docs/ch6.md index 6e9c16a..70674c9 100644 --- a/docs/ch6.md +++ b/docs/ch6.md @@ -1,7 +1,6 @@ # 第 6 章 Objects and Data Structures -Image -Image +![](figures/ch6/6_1fig_martin.jpg) There is a reason that we keep our variables private. We don’t want anyone else to depend on them. We want to keep the freedom to change their type or implementation on a whim or an impulse. Why, then, do so many programmers automatically add getters and setters to their objects, exposing their private variables as if they were public? @@ -160,13 +159,10 @@ There is a well-known heuristic called the Law of Demeter2 that says a module sh More precisely, the Law of Demeter says that a method f of a class C should only call the methods of these: -• C - -• An object created by f - -• An object passed as an argument to f - -• An object held in an instance variable of C +- C +- An object created by f +- An object passed as an argument to f +- An object held in an instance variable of C The method should not invoke methods on objects that are returned by any of the allowed functions. In other words, talk to friends, not to strangers. @@ -185,7 +181,7 @@ This kind of code is often called a train wreck because it look like a bunch of ``` Are these two snippets of code violations of the Law of Demeter? Certainly the containing module knows that the ctxt object contains options, which contain a scratch directory, which has an absolute path. That’s a lot of knowledge for one function to know. The calling function knows how to navigate through a lot of different objects. -Image +![](figures/ch6/6_2fig_martin.jpg) Whether this is a violation of Demeter depends on whether or not ctxt, Options, and ScratchDir are objects or data structures. If they are objects, then their internal structure should be hidden rather than exposed, and so knowledge of their innards is a clear violation of the Law of Demeter. On the other hand, if ctxt, Options, and ScratchDir are just data structures with no behavior, then they naturally expose their internal structure, and so Demeter does not apply. diff --git a/docs/ch7.md b/docs/ch7.md index 8cba543..27e0393 100644 --- a/docs/ch7.md +++ b/docs/ch7.md @@ -1,11 +1,8 @@ # 第 7 章 Error Handling -Image - -Image by Michael Feathers -Image +![](figures/ch7/103fig01.jpg) It might seem odd to have a section about error handling in a book about clean code. Error handling is just one of those things that we all have to do when we program. Input can be abnormal and devices can fail. In short, things can go wrong, and when they do, we as programmers are responsible for making sure that our code does what it needs to do. @@ -214,7 +211,7 @@ Often a single exception class is fine for a particular area of code. The inform DEFINE THE NORMAL FLOW If you follow the advice in the preceding sections, you’ll end up with a good amount of separation between your business logic and your error handling. The bulk of your code will start to look like a clean unadorned algorithm. However, the process of doing this pushes error detection to the edges of your program. You wrap external APIs so that you can throw your own exceptions, and you define a handler above your code so that you can deal with any aborted computation. Most of the time this is a great approach, but there are some times when you may not want to abort. -Image +![](figures/ch7/103fig02.jpg) Let’s take a look at an example. Here is some awkward code that sums expenses in a billing application: ```java diff --git a/docs/ch8.md b/docs/ch8.md index dde0f7b..ee5951c 100644 --- a/docs/ch8.md +++ b/docs/ch8.md @@ -1,9 +1,8 @@ # 第 8 章 Boundaries -Image by James Grenning -Image +![](figures/ch8/113fig01.jpg) We seldom control all the software in our systems. Sometimes we buy third-party packages or use open source. Other times we depend on teams in our own company to produce components or subsystems for us. Somehow we must cleanly integrate this foreign code with our own. In this chapter we look at practices and techniques to keep the boundaries of our software clean. @@ -15,7 +14,7 @@ Let’s look at java.util.Map as an example. As you can see by examining Figure Figure 8-1 The methods of Map -Image +![](figures/ch8/114fig01.jpg) If our application needs a Map of Sensors, you might find the sensors set up like this: ```java @@ -168,7 +167,7 @@ In Figure 8-2, you can see that we insulated the CommunicationsController classe Figure 8-2 Predicting the transmitter -Image +![](figures/ch8/119fig01.jpg) This design also gives us a very convenient seam3 in the code for testing. Using a suitable FakeTransmitter, we can test the CommunicationsController classes. We can also create boundary tests once we have the TransmitterAPI that make sure we are using the API correctly. diff --git a/docs/ch9.md b/docs/ch9.md index 16ef071..d73b2ac 100644 --- a/docs/ch9.md +++ b/docs/ch9.md @@ -1,9 +1,6 @@ # 第 9 章 Unit Tests -Image -Image - -Image +![](figures/ch9/9_1fig_martin.jpg) Our profession has come a long way in the last ten years. In 1997 no one had heard of Test Driven Development. For the vast majority of us, unit tests were short bits of throw-away code that we wrote to make sure our programs “worked.” We would painstakingly write our classes and methods, and then we would concoct some ad hoc code to test them. Typically this would involve some kind of simple driver program that would allow us to manually interact with the program we had written. @@ -352,13 +349,13 @@ Listing 9-8 ``` The three test functions probably ought to be like this: -• Given the last day of a month with 31 days (like May): +- Given the last day of a month with 31 days (like May): 1. When you add one month, such that the last day of that month is the 30th (like June), then the date should be the 30th of that month, not the 31st. 2. When you add two months to that date, such that the final month has 31 days, then the date should be the 31st. -• Given the last day of a month with 30 days in it (like June): +- Given the last day of a month with 30 days in it (like June): 1. When you add one month such that the last day of that month has 31 days, then the date should be the 30th, not the 31st.