2012 was for sure the year “big data” went ballistic, and throughout 2013 and 2014 it became commonplace and commodified. It is so prevalent nowadays in both industry and academia that is has almost lost any meaning. But when did this trend start? Or, to be more concrete, when was the term “big data” coined?

It was for sure before 2010. One of the possible culprit is Randal E. Bryant, who also coined the term DISC (Data-Intensive Scalable Computing), which I prefer over “big data” to describe tools such as Hadoop — it’s just much more precise. However, this happened just around the corner in 2008.

You might think that “big data” is a recent things, at least more recent than 2000. Well, think again. This paper by Diebold shows a few references from the ’90s. In particular, footnote 9 says:

On the academic side, Tilly (1984) mentions Big Data, but his article is not about the Big Data phe- nomenon and demonstrates no awareness of it; rather, it is a discourse on whether statistical data analyses are of value to historians. On the non-academic side, the margin comments of a computer program posted to a newsgroup in 1987 mention a programming technique called “small code, big data.” Fascinating, but off-mark. Next, Eric Larson provides an early popular-press mention in a 1989 Washington Post article about firms that assemble and sell lists to junk-mailers. He notes in passing that “The keepers of Big Data say they do it for the consumer’s benefit.” Again fascinating, but again off-mark. (See Eric Larson, “They’re Making a List: Data Companies and the Pigeonholing of America,” Washington Post, July 27, 1989.) Finally, a 1996 PR Newswire, Inc. release mentions network technology “for CPU clustering and Big Data applications…” Still off-mark, neither reporting on the Big Data phenomenon nor demonstrating awareness of it, instead reporting exclusively on a particular technology, the so-called high-performance parallel interface.

The best guess at when the term was coined is 1998, by John Mashey (retired former Chief Scientist at SGI) who produced slide deck entitled “Big Data and the Next Wave of InfraStress”. However, the famous 3V’s of big data came around 2001 introduced by Laney at Gartner.

