Category Machine Learning

Logistic Regression – Geometric Intuition

Everybody who has taken a machine learning course probably knows the geometric intuition behind a support vector machine (SVM, great book): A SVM is a large margin classifier. In other words, it maximizes the geometric distance between the decision boundary and the classes of samples.…

Nutch

Nutch – Plugin Tutorial

In one of my previous posts about Nutch, I already mentioned plugins. The plugin system is central to how Nutch works and allows you to customize Nutch to your personal needs in a very flexible and maintainable way. Everybody who…

Nutch

Nutch – How It Works

After the installation of Nutch as described in my previous post, you can either follow this tutorial without the need of thinking, or get a sense of how Nutch actually works beforehand. I recommend doing both in parallel. And since you won’t find…

Nutch

Nutch – Installation

Nutch is a flexible and powerful open source tool for web crawling, developed by the Apache Software Foundation and its community. It builds on Apache Solr and comes with an integration of the highly popular Apache Hadoop, which actually started…