A lot of people seem to think performance is about doing the same thing, just doing it faster. That’s not what performance is all about. If you can do something really fast really well, people start using it differently.

Linus Torvalds (speaking about git)

Because more is not just more. More is different.

I was watching Jeff Dean’s keynote presentation for the ACM Symposium on Cloud Computing 2010 (SOCC) that was held yesterday and I found this very interesting bit of information. This is so useful that every Computer Scientist and Engineer should learn it by heart!

Operation Time (nsec)
L1 cache reference 0.5
Branch mispredict 5
L2 cache reference 7
Mutex lock/unlock 25
Main memory reference 100
Compress 1KB bytes with Zippy 3,000
Send 2K bytes over 1 Gbps network 20,000
Read 1MB sequentially from memory 250,000
Roundtrip within same datacenter 500,000
Disk seek 10,000,000
Read 1MB sequentially from disk 20,000,000
Send packet CA -> Netherlands -> CA 150,000,000

These numbers give you some insight into why random reads from a disk are a really bad idea.

This piece information complements the very nice image from Adam Jacobs, and his excellent “The Pathologies of Big Data” article.

Comparison of random and sequential speeds for Memory, SSD and Disk

Random is BAD (and SSD is NOT going to solve the problem)

What should we learn from all this stuff?

  • Do your back-of-the-envelope calculations
  • Do avoid random operations
  • Do benchmarks your system

