One of the hottest topics in Silicon Valley these days is “Big Data”. It feels like every meetup or event in the Bay Area is somehow connected to it and once an event is particularly about Big Data, sure enough it is vastly overcrowded. A couple of days ago I attended the SVForum Main Event “What’s Hot & What’s Not for 2012” and it literally took 5 minutes until Big Data was the center of attention. But what are the reasons for the sudden hype? After all, companies like Walmart have stored tremendous amounts of data already years ago. The overwhelming attention to Big Data especially surprised me, once I took a look at the target market: it does not aim at the B2C market, where it is easier to create a lot of buzz. In contrast, it is merely a B2B topic. Hence, I was trying to wrap my head around this phenomenon and now want to provide my personal point of view why Big Data is currently such an important topic.
Data storage costs became “to cheap to matter” (as Chris Anderson puts it in his interesting book “FREE The Future of a Radical Price”)
While data storage capacity is increasing, it’s cost is falling. According to Chris Anderson, this happens even faster than Moore’s Law predicts. Data storage costs became so cheap, that it basically doesn’t matter to companies how much data they store, because the potential earnings of every stored bit by far exceed it’s cost.
Increased amount, velocity and variety of data
I guess today it is common knowledge, that we create more data than ever. What is interesting though is the pace of the data growth. As the picture on the right illustrates, the research company IDC predicts an overwhelming exponential data growth in its study “Digital Universe”. Also, in a Big Data event I attended this week, Joydeep Das mentioned that worldwide more data is created per year than can actually be stored.
Now, trying to make sense of these findings by looking at my daily life turns out to be quite easy: I use the Internet A LOT, I am increasingly active on social media platforms (facebook, twitter, stylight (German social shopping site), pinterest (or the German equivalent pinspire), …) and I can feel ubiquitous computing more and more coming true (smart meters, smartphones (location tracking), nest, …). All these three categories – Internet usage, social media and sensors – are ideal for tracking and storing a vast amount of data about me and the environment.
Additionally, as I just recently learned, Big Data is not only about volume. In the Gartner article “Big Data Challenges for the IT Infrastructure Team”, the authors argue, that next to volume, also the attributes velocity and variety are driving the Big Data paradigm. Velocity in a way is dependent on volume: the more data I create, the faster I need to process it. And variety depicts the increased heterogeneity of created data: sensor data, social media data, etc.
Amount of available public data increases
But wait, isn’t Big Data just interesting for really big or web-based companies? I think the answer should be “no”. It’s because “open data” is increasingly gaining momentum. Recently governments all over the world started to make data available to the public. Just take a look at data.gov, which was launched in 2009. In addition, online platforms arise, which promote the disclosure of data as well as distribute open datasets. Factual for example is an interesting Silicon Valley startup in this area. Combined with all the available public data on the Internet, the private data of smaller companies suddenly becomes more valuable and…big.
Maturity of open source software to handle Big Data
The first time I heard about Apache Hadoop was last year in October. Ever since then I tried to stay up-to-date regarding that topic and am fascinated by the speed and scale by which the community and products around Hadoop evolve. Companies like Google or Yahoo were using Big Data technology already years ago. However, for the majority of people, even in the IT industry, this technology was not accessible. With Hadoop becoming more mature and gaining attention, all of a sudden everybody can experiment with parallel processing: Rent some servers on Amazon EC2 and follow the Hadoop tutorial – its that easy (at least in theory)! In my opinion, this sudden rise of accessibility of Big Data technology acted as the tipping point for Big Data.