- When you have been exploring a dataset for a while, studying its distribution, its composition, its quirks, its innards and its “essence” in the end.
- When you know the answer to a query even before performing it.
- When you have a strict hierarchical organization of the folders for plots.
- When you know by heart the number of unique URLs and usernames in the dataset.
- When your methodology to name files according to their schema has become more complex that the schemas themselves.
- When you have five different versions of the dataset, but you forgot the reason behind four of them.
- When the size of the scripts to analyze the dataset begins to rival the dataset itself.
- Ultimately, when the only thought of putting again your hands on that data gives you urticaria.
That’s when you grokked the data.
Now imagine doing that on hundreds of gigabytes…