Earlier this year, The New York Times famously declared 2012 the dawn of the “Age of Big Data” — an era when previously incomprehensible mountains of the world’s information can be distilled down into useful information. You’re already contributing to that mountain when you perform a Google search, buy something on Amazon, or upload a photo to Facebook. You benefit from companies refining it behind the scenes whenever Google finds exactly what you were looking for. Or a website displays an online ad for something you actually want. Or even when Facebook suggests people you already know as friends.
But the potential for Big Data goes much deeper, to the point where we may be able to calculate nearly all aspects of life.
Despite the countless papers, articles, and blog posts dedicated to the buzzword, Big Data remains a vague concept for most Web users. So for this week’s State of the Web, let’s take a look at a few of the most important aspects of this awesome, terrifying thing called Big Data, and what it could mean for the everyday person.
What is Big Data?
Big Data does not just refer to the amount of information available, but the ability of our computer systems to store and process this information economically. This evolutionary drop in the cost of computing power has taken “lots of data” and turned it into “Big Data,” which has a few important prerequisite qualities: variety, volume, and velocity. These terms were first attributed to Big Data by Gartner researcher Doug Laney in 2001 (PDF).
It’s data, in a general sense (variety): Let’s just get this first part out of the way: When people talk about Big Data, they aren’t necessarily referring to one type of information. In fact, they could be referring to any type of data: Facebook status updates, tweets on Twitter, digital images, closed-circuit camera feeds, medical histories, credit-card transactions, consumer purchasing histories, climate information, GPS location data, on and on.
If it can be stored on a computer, then it can be part of Big Data.
It’s big (volume): The key word, of course, is “big” — really big. In 2011 alone, we created or replicated 1.8 zettabytes (1.8 trillion gigabytes) of data — a number that is set to double in 2013, according to EMC (PDF). However, what constitutes “big” for one company is minuscule for another. Facebook, for example, currently stores over 100 petabytes (1 billion gigabytes) of images on its servers. The atomic physics experiments at CERN pump out 40 terabytes of data every second. But a recent study from data management company Actian Corporation shows that businesses that deal with large amounts of typically data define “big” as between 1 terabyte and 1 petabyte.
It’s quick — sometimes (velocity): A third aspect of Big Data is the rate at which information flows into a system. Twitter, for example, processes an average of about 5,000 tweets per second, according to the company’s open source manager Chris Aniszcyzyk. This number can jump significantly during high-profile events, like the Super Bowl or a major natural disaster. Other areas with high-velocity data include financial transactions, weather data, GPS coordinates, and sensor feeds from scientific equipment.
The big question of Big Data is what use does all this information hold? At the moment, we don’t really know — and that’s what makes Big Data so exciting for so many industries. Companies already have mountains of information — much of it about you and me — but about 80 percent, by some estimates, is in a form that is difficult (but not impossible) for computers to “understand.” This data is called “unstructured,” and includes things like JPEG images, audio files, video files, and even many text files, including email, text messages, and blog posts. The challenge companies now face is figuring out how to turn their unstructured data into a usable information — a challenge they are quickly overcoming thanks to new applications, like Google’s BigQuery and Dremel tools.
What about now?
While we already benefit from Big Data every time we use the Web, the potential applications of Big Data extends far beyond obvious things like online search and ads. The areas where Big Data is expected to have the most immediate, revolutionary effects are business and health care.
Google and Facebook built their entire businesses on Big Data by creating services (search, connecting with friends) that are both derived from, and fueled by, massive amounts of data handed to them by users. The more features they offer, the larger their Big Data collections become, which in turn results in even more online products (to sell advertising around — a service that is itself powered by Big Data).
In other words, Google and Facebook figured out a way to monetize the data they collected. Big Data is their business. But countless other companies are looking to Big Data to provide insights into their business that were never before possible. Companies can use Big Data to tweak their advertising, prices, production operations, shipping activities, and hiring processes. At the moment, it seems, the possible uses of Big Data for business are only limited by the technical abilities of our computers and our imaginations.
In addition to making people money, Big Data is becoming increasingly useful in the field of medicine. Companies like DNAnexus and Appistry are looking to harness the vast amounts of data created by genome sequencing to help discover cures for disease far faster than has been possible.Startup Apixio is looking to bring medical records into the cloud to allow doctors to better choose treatments for their patients. Even IBM’s “Jeopardy!”-winning supercomputer Watson — which uses Big Data to power its artificial intelligence — is lending a helping hand, thanks to a partnership with WellPoint that will allow patients to access hoards of data to help them make health-care decisions.
Big and getting bigger
The reach of Big Data doesn’t end there. Governments, scientists, militaries, and non-governmental organizations have all begun to tap into the vast power of Big Data. That power only increases as different data sets combine to offer more insight, to solve more problems, to answer deeper questions, to predict the future in ways that are impossible today.
For average people like you and me, Big Data will provide countless new services and resources. It may even save our lives. But like all great advancements in human history, Big Data comes at a cost. Innumerable aspects of our lives — our habits, our moods, our medical histories, our personalities, our weakness and strengths, where we go, who we talk to, what we love and hate and fear — are all being amassed in nameless data centers around the world. This information may one day be used to assess whether or not you are good for a job, or a school, or whether you should have children. Companies and governments will surely know more about you and your future that you do. (In many ways, they already do.) So as Big Data gets even bigger, and the information squeezed out of it becomes more plentiful, profitable, and potent, we need to make sure this quickly moving tsunami of information doesn’t drown us in its wake.