• When you have been exploring a dataset for a while, studying its distribution, its composition, its quirks, its innards and its “essence” in the end.

  • When you know the answer to a query even before performing it.

  • When you have a strict hierarchical organization of the folders for plots.

  • When you know by heart the number of unique URLs and usernames in the dataset.

  • When your methodology to name files according to their schema has become more complex that the schemas themselves.

  • When you have five different versions of the dataset, but you forgot the reason behind four of them.

  • When the size of the scripts to analyze the dataset begins to rival the dataset itself.

  • Ultimately, when the only thought of putting again your hands on that data gives you urticaria.

That’s when you grokked the data.

Now imagine doing that on hundreds of gigabytes…