Feeds:
Posts
Comments

Posts Tagged ‘big data’

Grokking data

  • When you have been exploring a dataset for a while, studying its distribution, its composition, its quirks, its innards and its “essence” in the end.
  • When you know the answer to a query even before performing it.
  • When you have a strict hierarchical organization of the folders for plots.
  • When you know by heart the number of unique URLs and usernames in the dataset.
  • When your methodology to name files according to their schema has become more complex that the schemas themselves.
  • When you have five different versions of the dataset, but you forgot the reason behind four of them.
  • When the size of the scripts to analyze the dataset begins to rival the dataset itself.
  • Ultimately, when the only thought of putting again your hands on that data gives you urticaria.
That’s when you grokked the data.
Now imagine doing that on hundreds of gigabytes…

Read Full Post »

« Newer Posts