Large Scale CTR Prediction – Lessons Learned

PyData happened in San Francisco two weeks ago and I’m happy to say that I was fortunate enough to be one of the speakers at this fine event. It was three exciting days of meeting interesting people and listening to insightful … read more

Logistic Regression – Geometric Intuition

Everybody who has taken a machine learning course probably knows the geometric intuition behind a support vector machine (SVM, great book): A SVM is a large margin classifier. In other words, it maximizes the geometric distance between the decision boundary and the classes of samples. … read more

Nutch – Plugin Tutorial


In one of my previous posts about Nutch, I already mentioned plugins. The plugin system is central to how Nutch works and allows you to customize Nutch to your personal needs in a very flexible and maintainable way. Everybody who … read more

Nutch – How It Works


After the installation of Nutch as described in my previous post, you can either follow this tutorial without the need of thinking, or get a sense of how Nutch actually works beforehand. I recommend doing both in parallel. And since you won’t find … read more

Nutch – Installation


Nutch is a flexible and powerful open source tool for web crawling, developed by the Apache Software Foundation and its community. It builds on Apache Solr and comes with an integration of the highly popular Apache Hadoop, which actually started … read more

Big Data – Why All of a Sudden?

Big Data - Why All of the Sudden

One of the hottest topics in Silicon Valley these days is “Big Data”. It feels like every meetup or event in the Bay Area is somehow connected to it and once an event is particularly about Big Data, sure enough … read more