Small Data >/= Big Data

Written by Max Cherney

There’s a lot of confusion about Big Data. But don’t be fooled by meaningless jargon. Sometimes “small data” is more valuable. — The term big data has been thrown around a lot these days. It’s a buzzworthy term, but unfortunately it’s devolved into meaningless jargon.

And I’m not alone in this belief. Stephen Few, one of the first people to embrace the potential of big data, has gone on record saying the same thing. Others, like Ed Dumbill, also agree. And even more-recently, Samuel Arbesman in The Washington Post opined along similar lines.

Few, Dumbill and Arbesman all agree: big data is more myth than reality.

Image credit: Wikimedia Commons

Image credit: Wikimedia Commons

At this point, it’s no longer a matter of opinion. A Microsoft research paper from 2012 exposed a startling misuse of big data at both Microsoft and Yahoo.

The researchers discovered several Hadoop installations that weren’t thought through — in fact, processing less than 14 gigabytes of data each. That’s a really small number, most certainly not big data.

Keep in mind that Microsoft and Yahoo are industry-leading technology companies. If big data is getting misused there, one has to wonder about how it’s being misinterpreted elsewhere.

Even more-interesting, the report states that in many cases it is simply easier and cheaper to upgrade an existing machine’s hardware with more RAM and hard disk space. Even adding additional CPUs is a viable alternative.

That’s not to say there’s no such thing as big data, however. Quite the contrary: We’ve always had big data. But in the past, especially 20 or so years ago, big data looked quite a bit different than it does today — especially because storage has become much, much cheaper.

What is big data then?

The definition of big data is simple: Big data is a data set that won’t fit onto a single drive. And currently that’s a couple of terabytes.

That’s also why I think big data’s definition is constantly changing and evolving. As storage continues to get cheaper, big data is restricted to data sets that are truly gargantuan.

So who needs big data? While obviously that’s got to be considered by IT departments on a case-by-case basis, it’s clear that only companies with very large datasets — like the type Google maintains — actually require big data software and hardware.

For most organizations, regular old “small” data will do just fine.

For, I’m Max Cherney.

Based in San Fransico, Max A. Cherney is a tech journalist. He contributes tech info and articles here at Email questions or tips to:

1 Comment