PyData happened in San Francisco two weeks ago and I’m happy to say that I was fortunate enough to be one of the speakers at this fine event. It was three exciting days of meeting interesting people and listening to insightful…
Everybody who has taken a machine learning course probably knows the geometric intuition behind a support vector machine (SVM, great book): A SVM is a large margin classifier. In other words, it maximizes the geometric distance between the decision boundary and the classes of samples.…
A challenge which machine learning practitioners often face, is how to deal with skewed classes in classification problems. Such a tricky situation occurs when one class is over-represented in the data set. A common example for this issue is fraud…
Half a year ago, I was working in the heart of Silicon Valley and attended many meetups and networking parties – yes, I would call them parties rather than events. It became obvious to me that I wanted to try…
In one of my previous posts about Nutch, I already mentioned plugins. The plugin system is central to how Nutch works and allows you to customize Nutch to your personal needs in a very flexible and maintainable way. Everybody who…
This is going to be an ongoing article series about various aspects of Machine Learning. In the first post of the series I’m going to explain why I decided to learn and use R, and why it is probably the best statistical…
As you might already have noticed by now, one of my big interests is IT, especially new developments and trends within that area. Just so happens, that in my spare time I like to do many kinds of sports and…
After the installation of Nutch as described in my previous post, you can either follow this tutorial without the need of thinking, or get a sense of how Nutch actually works beforehand. I recommend doing both in parallel. And since you won’t find…
Nutch is a flexible and powerful open source tool for web crawling, developed by the Apache Software Foundation and its community. It builds on Apache Solr and comes with an integration of the highly popular Apache Hadoop, which actually started…
Next to my interest for Cloud Computing, I recently started to focus more and more on Machine Learning and became passionate about it. To me, it is a truly fascinating field of work! However, when I talk to friends, more…
One of the hottest topics in Silicon Valley these days is “Big Data”. It feels like every meetup or event in the Bay Area is somehow connected to it and once an event is particularly about Big Data, sure enough…