Feeds:
Posts
Comments

Posts Tagged ‘proposal’

SMAQ vs CDC

The name SMAQ (Storage MapReduce And Query) Cloud Stack was proposed by Edd Dumbill in this article on O’Reilly Radar (the article is sure worth reading).

While I find that a nice name is a good way to crystallize a concept, I am not sure it captures the whole picture about Big Data.
For example, compare the image on the left from the article, with the one on the right that comes directly from my Ph.D. research proposal.

The SMAQ stack for Big Data

View of the SMAQ stack from O'Reilly Radar

Cloud Computing Stack

View of the Data Intensive Cloud Computing stack from my Ph.D. research proposal

I will now dub my stack proposal CDC (Computation Data Coordination) stack (suggestions for better names super welcome!).

The query layer from SMAQ would map to my High Level Languages layer.
This layer includes systems like Pig, Hive and Cascading.

The MapReduce layer from SMAQ would map to my Computation layer.
The only other system in this layer is Dryad, for the moment, but there could be many others.

The Storage layer from SMAQ would map to my Distributed Data layer.
The SMAQ classification does not differentiate between systems like HDFS and HBase.
According to me, the high level interface is what distinguishes HDFS from HBase (or for example Voldemort from Cassandra).
That is why I would put HDFS and Voldemort in the Distributed Data layer, and HBase and Cassandra in the Data Abstraction layer (even though Cassandra does not actually rely on another system to store its data).

Finally, the SMAQ stack is totally lacking the Coordination layer.
This is comprehensible, as the audience of radar is more “analyst” oriented. From a operational perspective, systems like Chubby and Zookeeper are useful to build the frameworks in the stack.

What is missing from both stacks, and will be the main trend in 2011, is the Real-Time layer (even though I would have no idea where to put it 🙂 )

Advertisements

Read Full Post »

PhD thesis proposal

I filed my thesis proposal for my PhD on February 15th 2010.

The real part of the proposal (the last chapter) is willingly short and generic. I believe in an agile approach to planning: you can’t know upfront everything you are going to do, so planning every tiny detail in advance is a useless waste of time. While you do research you get a deeper understanding of the subject (as in programming), and so you get new ideas of trash old ones.

A nice thing to do would be to transform the state-of-the-art chapter in a comparative survey. To do this I need to experimentally evaluate most of the softwares I review. It is for sure not a quick thing. And I also have to find a good benchmark.

Here the pdf of my thesis proposal

Read Full Post »