Feeds:
Posts
Comments

Posts Tagged ‘stack’

SMAQ vs CDC

The name SMAQ (Storage MapReduce And Query) Cloud Stack was proposed by Edd Dumbill in this article on O’Reilly Radar (the article is sure worth reading).

While I find that a nice name is a good way to crystallize a concept, I am not sure it captures the whole picture about Big Data.
For example, compare the image on the left from the article, with the one on the right that comes directly from my Ph.D. research proposal.

The SMAQ stack for Big Data

View of the SMAQ stack from O'Reilly Radar

Cloud Computing Stack

View of the Data Intensive Cloud Computing stack from my Ph.D. research proposal

I will now dub my stack proposal CDC (Computation Data Coordination) stack (suggestions for better names super welcome!).

The query layer from SMAQ would map to my High Level Languages layer.
This layer includes systems like Pig, Hive and Cascading.

The MapReduce layer from SMAQ would map to my Computation layer.
The only other system in this layer is Dryad, for the moment, but there could be many others.

The Storage layer from SMAQ would map to my Distributed Data layer.
The SMAQ classification does not differentiate between systems like HDFS and HBase.
According to me, the high level interface is what distinguishes HDFS from HBase (or for example Voldemort from Cassandra).
That is why I would put HDFS and Voldemort in the Distributed Data layer, and HBase and Cassandra in the Data Abstraction layer (even though Cassandra does not actually rely on another system to store its data).

Finally, the SMAQ stack is totally lacking the Coordination layer.
This is comprehensible, as the audience of radar is more “analyst” oriented. From a operational perspective, systems like Chubby and Zookeeper are useful to build the frameworks in the stack.

What is missing from both stacks, and will be the main trend in 2011, is the Real-Time layer (even though I would have no idea where to put it 🙂 )

Read Full Post »