Programming Collective Intelligence

Programming Collective Intelligence is a great book. It covers most of the existing data mining algorithms and presents many applications for them.  It covers clustering (k-means, hierarchical), supervised classification (k-nearest neighbours, Naïve Bayes, decision trees, SVM), data analysis (non negative matrix factorization), optimisation (hill climbing, simulated annealing and genetic algorithms) and end with genetic programming. Along the way, it present application like spam detection, pricing, recommendation, … If you want to start in data mining this is a very good way. 0

Example are given in Python, a language I never used. Nevertheless, it is quite easy to follow. Python has a very concise syntax which avoid to have hundred lines of code in the book. Many third party library are used, especially to connect to third party services (facebook, ebay, …) to produce the datasets.

In comparison to Collective Intelligence in Action, this book is more focused on data mining, there is for instance no discussion on how scale to big datasets.  Nevertheless, it contains a lot more information, so I would recomment this book instead of the other.


Let's stay in touch with the newsletter