I was watching Jeff Dean’s keynote presentation for the ACM Symposium on Cloud Computing 2010 (SOCC) that was held yesterday and I found this very interesting bit of information. This is so useful that every Computer Scientist and Engineer should learn it by heart!
|L1 cache reference||0.5|
|L2 cache reference||7|
|Main memory reference||100|
|Compress 1KB bytes with Zippy||3,000|
|Send 2K bytes over 1 Gbps network||20,000|
|Read 1MB sequentially from memory||250,000|
|Roundtrip within same datacenter||500,000|
|Read 1MB sequentially from disk||20,000,000|
|Send packet CA -> Netherlands -> CA||150,000,000|
These numbers give you some insight into why random reads from a disk are a really bad idea.
This piece information complements the very nice image from Adam Jacobs, and his excellent “The Pathologies of Big Data” article.
What should we learn from all this stuff?
- Do your back-of-the-envelope calculations
- Do avoid random operations
- Do benchmarks your system