Numbers everyone should know
I was watching Jeff Dean’s keynote presentation for the ACM Symposium on Cloud Computing 2010 (SOCC) that was held yesterday and I found this very interesting bit of information. This is so useful that every Computer Scientist and Engineer should learn it by heart!
Operation Time (nsec)
L1 cache reference 0.5
Branch mispredict 5
L2 cache reference 7
Mutex lock/unlock 25
Main memory reference 100
Compress 1KB bytes with Zippy 3,000
Send 2K bytes over 1 Gbps network 20,000
Read 1MB sequentially from memory 250,000
Roundtrip within same datacenter 500,000
Disk seek 10,000,000
Read 1MB sequentially from disk 20,000,000
Send packet CA -> Netherlands -> CA 150,000,000
These numbers give you some insight into why random reads from a disk are a really bad idea.
This piece information complements the very nice image from Adam Jacobs, and his excellent “The Pathologies of Big Data” article.
[caption id=”” align=”alignnone” width=”468”]
Random is BAD (and SSD is NOT going to solve the problem)[/caption]
What should we learn from all this stuff?
-
Do your back-of-the-envelope calculations
-
Do avoid random operations
-
Do benchmarks your system