Kenkyuu

I was watching Jeff Dean’s keynote presentation for the ACM Symposium on Cloud Computing 2010 (SOCC) that was held yesterday and I found this very interesting bit of information. This is so useful that every Computer Scientist and Engineer should learn it by heart!

Operation Time (nsec)

L1 cache reference 0.5

Branch mispredict 5

L2 cache reference 7

Mutex lock/unlock 25

Main memory reference 100

Compress 1KB bytes with Zippy 3,000

Send 2K bytes over 1 Gbps network 20,000

Read 1MB sequentially from memory 250,000

Roundtrip within same datacenter 500,000

Disk seek 10,000,000

Read 1MB sequentially from disk 20,000,000

Send packet CA -> Netherlands -> CA 150,000,000

These numbers give you some insight into why random reads from a disk are a really bad idea.

This piece information complements the very nice image from Adam Jacobs, and his excellent “The Pathologies of Big Data” article.

[caption id=”” align=”alignnone” width=”468”] Comparison of random and sequential speeds for Memory, SSD and Disk Random is BAD (and SSD is NOT going to solve the problem)[/caption]

What should we learn from all this stuff?

Do your back-of-the-envelope calculations
Do avoid random operations
Do benchmarks your system