Good Books, Papers and Courses

I’ve come across some good source of knowledge, and this includes good books, papers and online courses that I’d like to recommend. This section will cover Java, Python, Hadoop, Data mining, Algorithms, Data analytics, Cloud technology and Web backend development. I will only list papers, books and courses that I have completed and I truly love.


  1. “Hadoop: The Definitive Guide”: as the name suggests – the only def guide to Hadoop.
  2. “Machine Learning for Hackers”: a brief overview of popular supervised learning algorithms. “Will it Python” ported the code to Python.
  3. “Python for Data Analysis”: the only book you ever need to learn to use Python for data related software development jobs. Pandas are well explained in this book, so readers can think and work in “dataframe”s way.
  4. “Bandit Algorithms by O’Reilly”
  5. “OpenIntro Statistics”: statistics 101, just enough statistics for software engineers like me. It is often helpful to think from the statistical view point.
  6. “Java Performance_ The Definitive Guide”: probably the only book on modern Java (java 7,8) performance. Love it!
  7. “Core Java, Vol I and Vol II”: my reference book. A nice guide to beginners too.
  8. “Effective Java”: really need to read it again!
  9. “High Performance Python”: not finished reading it yet. It is a good book though. Hard to imagine how to write up about “performance” in Python, but the author did it very well.
  10. Apache Spark API by Example
  11. “Advanced Analytics with Spark”: read 2 chapters already. A good cookbook.
  12. “Java Concurrency in Practice”: it is still not outdated in 2015!
  13. “Mastering Apache Maven 3”: good introduction of Maven.
  14. “Effective Python”: the only “Effective” book on Python.

Online Courses from Coursera

  1. Machine Learning (Stanford): ML 101, enough for SDE.
  2. Functional Programming Principles in Scala: bible.
  3. Introduction to Data Science (UW): a nice tour touches every aspect of modern data engineering in Industry.
  4. Algorithms (Princeton): practical.
  5. Mining Massive Datasets (Stanford): practical data mining for big data.
  6. Data Analysis and Statistical Inference (Duke U): only course I need to learn statistical inference.
  7. Principles of Reactive Programming: for every engineer working on building MODERN cloud product.


  1. Dynamo: Amazon’s Highly Available Key-value Store
  2. Cassandra – A Decentralized Structured Storage System

I hide (:D) two links here describing how to get a job in a cool company


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s