Posts Tagged ‘syntactic sugar’

Actually it should not be an update, but a wrap-up, as I basically have finished my project for this year. My last patch already got a +1 and it’s just waiting for the tests to finish to be committed.

I completed my selected tasks PIG-1926 and PIG-1904 (see my previous post for an explanation of what they do), plus some more small fixes here and there: PIG-2156 PIG-2136 PIG-2060 PIG-2026 PIG-2025 PIG-2024

I also gave some longer term ideas on how to refactor the grammar to make it safer and easier to modify, and on some new features: PIG-2138 PIG-2123 PIG-2119 PIG-2047

However, given that I have still 1 month left before the official end of the GSoC, I will tackle the rest of the “Sugar” projects listed on the PIG GSoC page, which means adding syntax support for Tuple/Map/Bag conversions: PIG-1387

All my fixes will go in Pig 0.10, as 0.9 has already been branched and will be out very soon.

Working on the front end has been a very interesting and enriching experience.

  • I got to learn how to use ANTLR (my mentor called me an “ANTLR expert” :P).
  • I learned how Pig scripts are compiled and how to work with the logical, physical and mapreduce levels.
  • I have a full understanding of the workflow and the dataflow of the operators in Pig. I am sure this will come in handy in the future.
  • I also increased my proficiency in Pig/Latin scripting.
  • Finally, I really got to seriously use and appreciate git. It makes working on different patches at the same time a breeze.
See you in a month for the actual wrap-up!

Read Full Post »

My proposal for this year’s Google Summer of Code (GSoC) has been accepted!
Also this year I will be working on Apache Pig.
Last year I worked on the backend and on improving performance. This year instead I will work on the front end and on improving usability. I will implement a couple of “syntactic sugar” features for Pig/Latin.

  • Variable argument for SAMPLE and LIMIT. (PIG-1926)
    Currently, SAMPLE and LIMIT only take a constant argument. It would be better to be able to use a variable (scalar) in the place of a constant.
  • Default SPLIT destination. (PIG-1904)
    SPLIT partitions a relation into two or more relations.
    It would be useful to have a default destination for tuples that are not assigned to any other relation, in a fashion similar to a switch/case/default statement.

These features are simple but quite useful. My proposal outlines some interesting use cases.

This year I will be mentored by Thejas Nair. I am very happy to be able to contribute again to this very interesting open source project.

It’s a pity I didn’t start GSoCing before and this will be my last year (blame my memory, on my first year as a PhD student I missed the deadline by 3 days…).

Read Full Post »