Big data is big for a lot of reasons. Some are literal (its massive datasets) and some are based on the promise of what it could one day deliver. For instance, IDC estimates a 44 billion gigabyte-sized digital universe by 2020, and the big data inside it offers potentially huge amounts of actionable and mind-blowing insights.
At Equinix, we’re into helping uncover all of it. But a first step is understanding some key big data definitions. That’s what our “How to Speak Like a Data Center Geek” series is for.
We’ll start basic on our first big data entry, since the list of definitions associated with big data is … big.
Too obvious? Well, we wanted to expand the big data definition a bit beyond what’s clear just by reading it – namely, it involves “big” amounts of “data.” A geek can do better. Here’s a solid definition from McKinsey: “Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” So maybe big data can also be accurately called “too big data?”
In an important 2001 report, Gartner analyst Doug Laney laid out the defining dimensions of big data, and they all happen to begin with “V”:
- Volume: This refers to the depth and breadth of the data that must be managed, and is always growing. For instance, IBM says we create 2.5 quintillion bytes of data very day. That’s enough to fill 10 million Blu-ray discs.
- Variety: This is the diversity of the types of data that make up big data datasets. It could be from video, audio, text, photos, etc., and proper analysis involves reconciling it all.
- Velocity: The sheer and increasing speed with which data is acquired and used.
People have added or proposed more Vs over the years (value, veracity, variability), but it all starts with the 3Vs.
This is data that has a defined length and format, such as numbers and dates, and is usually stored in a database. It accounts for about 20% of the data out there, and its structured nature makes it easier to access and organize. So it is potentially powerful and widely usable.
This type of data does not follow a predefined data model or fit into relational databases. Examples include video, the text of email messages and social media. This makes up the bulk of the big data universe and has huge potential, but also presents bigger challenges for those trying to organize and gain insight from it.
DataInformed’s has a concise definition of analytics: “Using software-based algorithms and statistics to derive meaning from data.” But the reality is that big data analytics could have an entire Geek entry on its own (and maybe someday, it will). Here are a few subgroups of big data analytics: behavioral analytics, event analytics, location analytics, text analytics. The bottom line is that without good analytics, big data is akin to a mountainous pile of papers dumped on the floor of a 100-acre warehouse. Big data analytics makes big data make sense.
Here’s more about Equinix’s and big data.
And while we’re doling out additional info, we should note data center geeks have a thing for interconnection, since it’s essential for the enterprise to compete. Download Equinix’s IOA Playbook, which describes an interconnection-first architecture that securely connects people, locations, clouds and data.
Also, check out every post in the “Speak Like a Data Center Geek” series. (NOTE: Binge readers welcome.):
Part 13: Big Data (see above)