How to write Git Commit Message
Good “git commit message” serves as a log that tells WHY and WHEN we made changes. Whilst diff shows only WHAT. A useful message is comprised of:
Concise (50 chars) imperative subject with reference to issue/change #
One blank line between subject and body
Body explains WHAT WHY HOW and wraps at 72 chars.
[KERNEL-003] Update Network Module to Support Wifi
Nowadays people use WIFI. For this reason, it is unacceptable for kernel not
to support Wifi.
Add a new subcomponent for network module that provides the following features:
- listen to Wifi driver
- save logs
One liner is acceptable if the change is simple and straightforward
In shared repository model, PR triggers automatic test and build on CI and review request. The guideline to writing good commit message also applies to writing good PR title and body.
Bump Major when API changes
Bump Minor when adding new functionality without breaking API
Bump Path for bug fixes.
Review commit message as well as the actual code.
Don’t forget to praise
If in doubt Question rather than Judge
Look at the whole design and code surrounding the change, not just the change itself
There are many ways to have things done – respect the author
Bagging (bootstrap aggregating)
A simple and straightforward way of ensembling models by averaging results from multiple models. Each model is trained with a fraction of data with replacement. Each model votes with equal weight: averaging for regression and majority vote for classification.
E.g. random forests
Train models sequentially. Start with equally weighted data.
Increase weights on misclassified data for the next model.
So on and so forth…
Train a model that takes the output of multiple models as input.
“A successful Git branching model” by Vincent Dressen. Very nice read.
AWS S3 supports both s3 and s3n file system when communicates with HDFS.
s3n, the s3 native file system, allows files to be stored in the original format. s3 is the s3 block file system, which is block based storage, an equivalent of HDFS in AWS implementation. Other s3 tools would not be able to recognise the original file format, but to see a bunch of block files.
However s3n imposes a file size limit of 5G per file. s3 does not prevent users from storing large file bigger than 5G though.
Besides, s3 puts block files directly into a S3 bucket and occupies the whole bucket without differentiating folders. Whilst s3n puts files in original shape into a folder under S3 bucket. Hence s3n is more flexible in this sense.
Since my test files are mostly small files smaller than 1G, hadoop fs -cp outperforms hadoop distcp in my test. Besides, s3n boosts faster transmission than s3.
# copy files from hdfs to s3n
hadoop fs -cp hdfs://namenode.company.com/logsfolder/logs s3n://awsid:awskey@bucket/folder