Note: Java Coding Guidelines

This book has nothing to do with coding style guide. It offers advices of writing secure, reliable, high quality Java by looking at both coding and design aspects of building Java project.

Chapter 1: Security
I’d say most content on Security and Privilege management is not familiar to ordinary Java coders. Although we are not going to use those classes and configuration directly for the majority of Java projects on the planet, it is beneficial to know how Java manages security context in the face of many daily operations: file I/O, LDAP look up, loading classes, etc. A few practical lessons are learned:

  • Literally limit the lifetime of sensitive data – “around” validate code only and erase completely afterwards with no buffering and trace what so ever.
  • Use cookie without sensitive data and perform validate with secret on server side.
  • Wrap sensitive mutable class with immutable wrapper – embrace immutability.
  • Validate anything from inside and outside, and format generated content before output.
  • Do not trust equals, clone and all the overridable methods from untrusted code. Make your class final to avoid this.
  • Use hash with salt to store password

Chapter 2: Defensive Programming

  • Limit the scope of variable and @SuppressWarnings and accessibility of classes/methods
  • Always let methods provide feed back from return values and exceptions
  • Identify file using multiple attributes instead of just path, e.g. time
  • Beware of numeric promotion!
  • Public final fields are cached in compiled file, change in later version may not be seen by library users

Chapter 3: Reliability

Reliability – the capacity to maintain a level of performance when used under specified conditions. The faults in requirement, design and implementation introduce failures. A few lessons learned:

Encode relationship between constant definitions

Use try-with-resources

Handle runtime error gracefully instead of using assertions

Do not serialise handles to system resource – use transient

Direct buffer creation is expensive. Use direct buffer for long lived, frequently used objects.

Remove short-lived objects from long-lived containers – null them or reference to a special singleton NULL object


Note: Pragmatic Unit Testing in Java 8 with JUnit

I read the first version when I was in college. It introduced TDD and Junit to me just fine. I’d like to read the latest version to refresh my memory. In contrast to a decade ago, TDD and Java 8 are really hot nowadays.

— Part I: Foundation —

Determine What Tests to Write

Look at loops, if branches and complex conditions. Think about data variants, side effects. Think about a path of execution.

3 Step Test Method Construction

Arrange -> Act -> Assert

Tests with Exception

Use @Rule annotation with ExpectedException object. Do not wrap checked exception with try/catch. Instead rethrow the exception ” @Test public void method() throws Exception {}”. In this way, JUnit can report an error instead of a failure.

Organise Tests

Test behaviour rather than methods.

Physically separate tests from production code.

Exposing private behavior is worse than exposing private data. This means redesign is needed.

One test for one case with descriptive naming.

Tests are documents

Do not use comments to clarify test purpose. Instead, improve test name, local-variable name, use constants, hamcrest asserstion, and split tests.

Tests Naming

doOpsGneratesResult | someResultCooursWhenCondition

WhenDoBehaviorThenResult (short for given when then) – BDD

Before and After

The order of executing multiple before is unknown. After will execute even after “fail”.

Slow Tests

Annotate slow tests with category annotation.

Not run all the tests – caution though.

— Part II: Advanced —

FIRST: Characteristics of quality tests

  1. Fast and First (TDD)
  2. Isolated
  3. Repeatable – Java provides “fixed clock” Clock.fixed(Instant, zone) besides the “real clock” Clock.systemUTC()
  4. Self-validating – Assertion and CI tool
  5. Timely – Test more frequently.

Right-BICEP: what kinds of tests to write

  1. Right results: adapt unit tests to the latest requirements. test “happy path” here aiming for right answer.
  2. Boundary conditions
  3. Inverse relationships – sqrt(25) * sqrt(25) == 25
  4. Cross-check results – check with complement cases
  5. force Error conditions – write test that forces error to happen, this includes system load, network error, etc
  6. Performance within bounds – JMeter JUnitPerf

CORRECT: boundary conditions

  1. Conform to an expected format
  2. Ordered / unorderd
  3. Range
  4. Reference – anything external under no direct control. Behave gracefully when preconditions and postconditions are not met.
  5. Existence
  6. Cardinality – enough values. “0-1-n( 1 < x < N case + N case )” rule.
  7. Time – order, timing

— Part III: Refactor, Design and Mock —

JUnit helps with refactoring, which yields more maintainable clean code.

Single method serves single purpose. Command-Query Separation: a method either executes a command or answers a query, not both.

Single class serves single purpose. Map class to concept not concrete notions. “Profile and MatchProfile” example shows how to delegate the matching procedure to a MatchProfile class.

SOLID Class-design

  • Single Responsibility
  • Open-closed Principle – open for extension but closed for modification.
  • Liskov Substitution – overrided methods do not break functionality
  • Interface segregation – Split large interface into smaller interfaceS.
  • Dependency inversion – high level modules do Not depend on low level modules. Instead, both depend on abstractions. Details depend on abstraction.


Methods to inject dependency: new constructor, setter method, override factory, abstract factories, use tools such as Google Guice or Spring.

While speeding up tests, Mock introduces a gap between tests and production code. Hence write integration tests to cover these gaps.

Test cleanupRefactoring Tests besides refactoring production code.

Make clean test code – organised, short and simple.

  • Unnecessary Test Code: let test method throws exception instead of wrapping using try/catch
  • Missing Abstractions: write customised matcher that extends TypeSafeMatcher
  • Irrelevant Information: eliminate unnecessary data and use good naming
  • Multiple assertions: use multiple assertions only if they have to be used all together to complete a single behaviour. Only single purpose test is allowed.
  • Irrelevant Details in Test: Move irrelevant “to test” code to @before and @after.
  • Misleading organization: organise as arrange/act/assert AAA.
  • Implicit Meaning: make text and names explicit.

— Part IV: TDD, Test Threads and Persistence, CI —

– TDD  – Test often while making SMALL steps. Refactor often. Split Tests to multiple TestClasses even if I am testing the same class. This will make the methods’ name shorter. Test names serve as documentation. Naming is of great importance.

– Testing Threads and Persistence –

Quote: A> “rework the design to better support testing by separating concerns”; B> “Break dependencies using stubs and mocks”

– Testing on a Project 

Code coverage and CI. 70% coverage is usually a guideline. Use CI to improve code quality of the whole team.

Notes: Java Concurrency in Practice

Chapter 6. Task Execution Drawbacks of creating unlimited number of threads: Thread lifecycle overhead – thread creation and teardown overhead Resource consumption. Platform / JVM dependent limit: stack ~ outOfMemoryError. Thread Pools:

  • newFixedThreadPool
  • newCachedThreadPool
  • newSingleThreadExecutor
  • newScheduledThreadPool – timer

ExecutorService Interface extends Executor, adding methods for lifecycle management. Runnable -> Callable -> Future -> completionQueue Runnable has limited side effect: writing to a log or a shared data structure Callable expects to return a value or exception Lifecycle of a task to Executor: created, submitted, started, and completed. Future provides methods to test whether the task’s state and get the result or exception. completionQueue has results queued up in a QueueingFuture, a subclass of FutureTask, that place the results on a BlockingQueue. inVolkeAll can submit a list of tasks to executor. Chapter 7. Cancellation and Shutdown Quote: “FutureTask and Executor simplifies building cancellable tasks and services.” Use a single volatile boolean: a blocking queue may never check the boolean of a while-check-loop. Calling interrupt does not guarantee the thread will stop. A guest task should preserve the executing thread’s interruption status. – propagate the exception making the method an interruptible blocking method – Restore the interruption status so that code higher up on the call stack can deal with it Cancel through Future. Should not manipulate a thread unless you own it. JVM Shutdown shutdown hooks normal threads v.s. daemon threads – daemon threads do not prevent the JVM from shutting down. Finalisers are only useful if managing objects holding resources acquired by native methods. Chapter 8. Applying Thread Pools limited thread pool faces potential starvation, where a job is dependent on 2 threads it created and the later two are queued after this thread. Long-running tasks. Bounded queue, unbounded queue and synchronous handoff. Saturation policies deal with a bounded filled workk queue. ThreadPoolExecutor is easily extended by defining beforeExecute and afterExecute hooks. Puzzel Solver Framework from page 114 is a comprehensive example. Chapter 10. Avoid Liveness Hazards Deadlock Acquiring multiple locks introduces deadlock hazard. The order of acquiring locks should be the same for each method. This hazards are not always obvious: Dynamic lock order deadlocks (withdraw and lodge simultaneously: A-> B and B->A); Deadlock between cooperating objects. Open call: Prefer syncronized blocks to synchronised methods. Quote: “Programs that rely on open calls are easier to analyse than that calls alien methods with locks held.” Resource deadlock: Quote: “bounded pools and interdependent tasks do not mix well.” Because this causes thread starvation while a thread is waiting for a long running computation to finish. Lock ordering is a part of design when a program has to acquire multiple locks. Prefer timed trylock, i.e. explicit locks to intrinsic lock. Starvation Avoid to change thread priorities, which introduces platform dependency and possibly starvation, where a thread perpetually yield to other threads. Poor Responsiveness Lower priority of computational expensive background threads to make foreground thread more responsive. This can also be caused by poor lock management, for example, a thread takes to long to iterate a locked collection, while others are blocked and unresponsive. Livelock A not-blocked thread makes no process and keeps trying and failing. This can also happen when multiple threads are changing their state in response to the others so that nobody makes progress. Solution: introduce randomness into the retry mechanism. Chapter 11: Performance and Scalability Performance costs introduced by multi-threading: locking, signalling, context switch, memory synchronisation, thread creation and teardown, scheduling overhead. Context switch – JVM switch out blocked thread. Memory synchronisation – visibility guarantee may involve memory barriers that flush and invalidate caches. Blocking – uncontended synchronisation can be optimised by JVM; contended synchronisation requires OS actively, context switch, etc. Performance – how fast ; how much Scalability – the ability to improve performance when additional resources are added. Always measure before optimisation – perfbar Reduce lock contention to improve performance and scalability Serialisation hurts scalability; context switch hurts performance; contended lock hurts both. Approaches – 1. reduce the duration of lock 2. reduce frequency of lock requests 3. replace exclusive locks with coordination mechanisms – e.g. centralised logging with a queue v.s. each thread contend for the shared I/O stream.. Allocating new objects is usually cheaper than synchronising. Chapter 12. Testing Concurrent  Programs Safety “Nothing bad ever happens” | Liveness “Something good eventually happens ” Introduce More Inter-leavings:

  • Creates more threads than the number of cores
  • Use Thread.yield to bring more context switches
  • Testing on variety of systems with different processor counts, OS, processor architecture

Adding calls for testing and removing them for production can be implemented using aspect-oriented programming tools. Run benchmark after the code is “warm up”, when JIT decides to compile byte code instead of interpreting the byte code Use -server instead of -client for both test and production code Avoid dead code in thread-local computation, print the values in case compiler optimise them away | better: compare the hashCode of that object to an arbitrary value and print only if the two hashCodes match QA methodologies:

  • Testing
  • Code Review
  • Static Analysis Tools
  • Aspect-oriented Testing Techniques
  • Profilers and Monitoring Tools

Chapter 13 – Explicit Locks ReentrantLock v.s. IntrinsicLock:fail gracefully and remain responsive by using timer (lock.tryLock(nanosToLock, unit)) or interruptible lock (lock.lockInterruptibly()) Advanced features of reentrantlock: timed, polled (tryLock), interruptible lock acquisition, fair queuing, non-block-structured locking (non-blocking linked list implementation makes use of the last one). Choose intrinsic lock if: do not need the advanced features of reentrantlock; intrinsic locking information is included in thread dumps with call frames; synchronised block allows JVM optimisation. Fair lock performs worse then unfair lock. For example, B is suspended waiting for A’s lock, C may instantly get the lock before B wakes up and finish its work before B completely wakes up. Unfair lock works better in this scenario. JVM is not required to implement fair lock for intrinsic lock. Always unlock reentrantlock in Finally block. Read-write lock: can access by multiple readers or a single write at the same time, but not both. Chapter 14 Building Custom Synchroniser: to build a state-dependent class Condition queue describes how multiple threads waiting on the same object’s notification. Condition has to be checked each time the thread is notified; remember to notify while holding the lock (notify in synchronised block or method); almost always use notifyAll instead of notify (use notify only uniform waiters with one condition predicate + one-on one-out). Issues with condition queue: subclass safety – document the notify protocol or make the base class final to prevent subclassing; encapsulate condition queue so alien code cannot wait on it. Along with single condition queue for intrinsic lock, Explicit Conditions exist for explicit lock.

  • wait – await
  • notify – signal
  • notifyAll – signalAll

Quote “Condition􏰀 offers 􏰀a􏰀 richer􏰀 feature􏰀 set􏰀 than 􏰀intrinsic 􏰀condition 􏰀queues: 􏰀multiple 􏰀wait􏰀sets 􏰀per􏰀 lock,􏰀interruptible 􏰀 and􏰀 uninterruptible 􏰀condition 􏰀waits,􏰀deadline 􏰁based 􏰀waiting, 􏰀and 􏰀 a 􏰀choice 􏰀of 􏰀fair 􏰀or 􏰀nonfair 􏰀queueing.􏰀 “ ReentrantLock,  semaphore, reentrantReadWriteLock, countDownLatch, synchronousQueue and FutureTask are built using AQS – abstract queued synchroniser. AQS holds a state that can be used to implement different concurrent mechanisms. Chapter 15 Atomic Variables and Non-blocking Algorithm compare-and-swap CAS Non-blocking Algorithms: can show starvation or live lock. To implement ConcurrentLinkedList using CAS is a good example. ABA problem: use AtomicStampedReference􏰀(and 􏰀its 􏰀cousin􏰀 AtomicMarkableReference)􏰀 to address this because they update a pair of values, a reference and a version number.

Notes: Java Performance The Def Guide

Chapter 2: Testing Approaches
Microbenchmarks: measure execution time of code snippets
Be aware of compiler optimisation.
Be aware of cold vm – warm-up period: measure after VM warm-up.
Be aware of threads contention.

Macrobenchmarks: measure the whole system in conjunction with any external resources.
Measure in production environment:full system testing with multiple JVMs.
Pro: Can optimise bottleneck first.

Mesobenchmark: e.g. benchmark the response time of rendering from JSP.

Elapsed Time
Response Time
* statistical significance
* Test early; Test often.
* Measure; Automate; On target system.

A load generator: Faban.

Chapter 3: Java Performance Toolbox
Tools from Linux: sar vmstat prstat iostat

Check CPU time first to understand why CPU usage is low. The goal is to drive the CPU usage up.

Disk usage
Swapping moves pages of data from RAM to disc and vice versa. This results in bad performance, especially for Java due to the Java heap.
vmstat has si and so for swap in and swap out respectively.
Common cause of IO bottleneck: inefficient writing; writing too much.

Usually network monitoring does not show utilization but counting packets and bits.
netstat is very basic.
nicstat shows utilization stats. 40% utilization means saturated for LAN.

Java Tools
Working with tuning flags: jcmd jinfo
Thread information jstack jcmd
Heap: jmap
UI: jconsole -class, GC, etc jvisualvm – VM info

Profiling Tools
sampling mode: relatively low profile; fewer measurement artifacts.
instrumented mode: intrusive; more information
Thread Timelines from profilers show how threads work together.
Native Profilers inspect JVM itself including JVM compiler code, GC and application code.
Java Mission Control: jmc command.

Chapter 4: “Just in time” compiler
“JIT” is in between interpreted and compiled. The source code is compiled to java binary, which is then interpreted or compiled depending on how hot or critical that piece of code is. JVM will interpret and run code first because it will know more of the code w.r.t how to optimise by compiling the code.

Trade off: when to use values from RAM when to store them in register
Threads do not see each other’s register. Synchronization is needed for such scenario.

JIT compiler: 32 client; 32 server; 64-bit server.
c1 / c2 compiler : client (compile starts early) / server (compile stage starts late with faster compiled code) compiler
tiered compiler : a mix of both of the above and it often performs the best for long running programs. This is enabled by default since Java 8.

32 bit v.s. 64 bit: use 32 bit JIT compiler if total process size (heap, perigean, native code and native memory) < 3 GB. 32bit has smaller footprint.

Tuning needed:
JIT version
Code Cache that holds the assembly lang instructions
Compilation Thresholds

Inspecting the compilation:
1. jstat -compiler [ps id]
2. start program by java -XX:+PrintCompilation

Chapter 12: Java SE API

Buffered IO: always use buffered IO: BufferedOutputStream/BufferedInputStream to wrap I/O stream. However, ByteArrayI/OStream is a buffer itself.

Classloader: Java7 enables parallel class loading.

Random Num: java.util.Random is slow for multi-thread scenario, where java.util.concurrent.ThreadLocalRandom is quite faster. In contrast to these two, generate true random numbers from system events (entropy-based sources).

JNI: avoid it. “Java call C” is slower than “C call Java” because a this object is passed.

Exception: they are not that slow. System Exception is optimised than Exceptions created from user code. Use defensive coding for good performance. Disable stack track to speed up loading class due to the hierarchical structure of class loader: app classLoader -> system -> bootstrap(root) loader.

String: reuse string; string encode() decode() are slow; user StringBuilder; one line concatenation is optimised by javac.

Logging: apply minimum logging that meets business needs.

Collection: choose the right data structure; the collections of concurrent* version are much faster than before with not much extra memory overhead compared with the unsynchronised version.; initialise collections with the right size / capacity.

Lambda: lambda works the same as anonymous class. However, they are not anonymous classes but a static method that called through a helper class in JDK. The performance of both is the same in usual cases. Lambdas is faster to create because anonymous classes involves using classLoader.

Streams: Parallelised code and lazy evaluation.

Filter: single filter is faster than iterator.

Notes: Mining Massive Data Sets

Chapter 3: Finding Similar Items

‘Jaccard Similarity’ measures the size of intersection between sets. While another approach is collaborative filtering.

Shingling of Documents

Shingling size: k=5 gives power(27, 5) = 14 M possible shingles. This is sufficient for emails as emails are usually shorter than 14 M characters. K = 9 is a common choice for long documents.

Stop-word based ‘shingles from words’ works better when used to find similar news articles. This is because news articles use much more stop words than ads and a like. This means the measured similarity is more sensitive to articles while not so much to the surrounding ads.

Reducing the size of shingles sets:
Hashing singles to 4 bytes each.
Computer signature from shingles using minhashing. This is to record the first non-0 element for each shingle in a permuted matrix representation. p 81. In this way, a N row * M column characteristic matrix is compressed to a k row * M column matrix, where k is the number of randomly selected permutations. “The probability that the minhash function for a random permutation of rows produces the same value for two sets equals the Jaccard similarity of those sets.”

Notes: High Performance Python

Profiling Toolbox:
Print the duration of computation using time.time(). A wrap up annotation can produce simple and elegant code.

''' copied from the book
from functools import wraps
def timefn(fn): 
    def measure_time(*args, **kwargs): t1 = time.time()
        result = fn(*args, **kwargs) t2 = time.time()
        print (&quot;@timefn:&quot; + fn.func_name + &quot; took &quot; + str(t2 - t1) + &quot; seconds&quot;)
        return result return measure_time
def calculate(para1, para2, para3):

Use unix time command, make sure to use /usr/bin/time directly
cProfile to profile the whole python module: python -m cProfile -s
Use line_profiler with @profile annotation.
Use memory_profiler with @profile annotation.
Print hp.heap() — need to install guppy
Dowser — live performance monitoring.
dis module can help to inspect CPython bytecode. dist.cist(module.func)
Perf — Linux tool to inspect paging, cache-miss, cpu usage and a lot MORE!

List v.s. Tuple
Use Tuple for immutable list. They both stores references, hence both can store a list of objects of different types.

Dict and Sets
The hashing implementation in Python uses open address. Usually last K bits of a value is used for key evaluation. If collides, another p bits are used to evaluate the offset based on which the next bucket position is selected.

Resizing happens when insertion instead of deletion. 2/3 full is optimal. On resize, the number of buckets increases by 4x until 50,000, after which by 2x. It can reduce size as well when necessary.

Namespacing: global look up > local look up > local assigned variable.
math.sin > sin [import sin from math] > a = math.sin

Iterators and Generators

Iterators are very useful as lazy evaluation is applied here. This is an advanced topic with much more details to consider.

Matrix and Vector Computation
System call (paging, IO, etc) is slow. New memory allocation is slow (In place operations are fast.). Cache-miss causes slow execution. Memory fragmentation causes slow execution. Branching is slow (fail to predict correctly for if/else while loading data to cache).

Bring a chunk of useful data to cache and memory is important. This requires to use appropriate data structure, e.g. numpy array, vector and matrix, that group useful data together. In contrast, normal Python list lists only references with the actual data distributed all over the place.

Less CPU commands often means less execution time. Use ‘Perf’ Linux tool here to gain deep understand of the program.

Multi-armed Bandit Algorithms

Multi-armed Bandid algorithms is another approach to unveil the performance difference between two algorithms when running campaigns.Why Multi-armed Bandit Algorithm is Not “Better” than A/B Testing suggests that it is not comparable with A/B testing in that this should primarily be used for continuous optimization whilst A/B testing should be used to draw conclusion on statistical significance with less traffic. This is because the incoming traffic is equally split up in A/B test. In contrast, for 90% of the time the traffic is diverted to currently best performing system.

A handy book on running bandit algorithms for web optimisation “Bandit Algorithms for Website Optimization” published by O’Reilly Media.

Drawbacks of epsilon greedy algorithm:
Say epsilon = 0.1. At the beginning this algorithm is not exploring much as 0.9 chance it is using the limited choices of algos that it explored. In the later stage, it still uses inferior algorithms knowing the algorithms have not performed well.

How to measure the performance of a particular bandit algorithm itself?
Probability of selecting the best arm: high epsilon keeps exploring, hence it reaches peak fast while the peak probability is low compared with low e settings, say 0.1.

Average Reward

Cumulative reward: this gives the whole picture whereas the above two give performance at a certain point.

In conclusion, the performance of one bandit algorithm is dependent on horizon, namely how many steps are there before finishing evaluation.

Softmax Algorithm
How worse is the worse one compared with the better one? 10 vs 13 is different than 10 vs 99. Therefore instead of choosing randomly when explore, we choose with probability of exp(Score_a) / (exp(Score_a) + exp(Score_b)). This one easily outperforms epsilon greedy bandit.

It’s better to less explore in later stage – Annealing Softmax.

Unlike epsilon greedy and softmax which are easily misled by noise, UCB is more robust. It looks at how much we know about each arm rather than simply how much reward we got already. Mind you, a good arm may initially perform bad by chance.

UCB metric: sqrt((2 * log(total_counts)) / float(self.counts[arm])).

UCB family is very practical in real world as it has no free parameter to tune.

Tips for Real world Deployment
A/A test to make sure the testing method and metrics are correct in that the same algorithm produce the same result.

Multiple concurrent tests may add certainty to the results.

A/B tests make sense for short-period periodic testing. While bandit algorithms are mostly useful for continuous experiments.

Selected Advanced Topics
Contextual Bandits
Update arms’ value using conventional ML, for example, GLM, LR to learn from and adapt to context.

Bandit Algorithm at Scales
Assign users in advance;
Update arm values in batches.