Actually it should not be an update, but a wrap-up, as I basically have finished my project for this year. My last patch already got a +1 and it’s just waiting for the tests to finish to be committed.
I completed my selected tasks PIG-1926 and PIG-1904 (see my previous post for an explanation of what they do), plus some more small fixes here and there: PIG-2156 PIG-2136 PIG-2060 PIG-2026 PIG-2025 PIG-2024
I also gave some longer term ideas on how to refactor the grammar to make it safer and easier to modify, and on some new features: PIG-2138 PIG-2123 PIG-2119 PIG-2047
However, given that I have still 1 month left before the official end of the GSoC, I will tackle the rest of the “Sugar” projects listed on the PIG GSoC page, which means adding syntax support for Tuple/Map/Bag conversions: PIG-1387
All my fixes will go in Pig 0.10, as 0.9 has already been branched and will be out very soon.
Working on the front end has been a very interesting and enriching experience.
- I got to learn how to use ANTLR (my mentor called me an “ANTLR expert” :P).
- I learned how Pig scripts are compiled and how to work with the logical, physical and mapreduce levels.
- I have a full understanding of the workflow and the dataflow of the operators in Pig. I am sure this will come in handy in the future.
- I also increased my proficiency in Pig/Latin scripting.
- Finally, I really got to seriously use and appreciate git. It makes working on different patches at the same time a breeze.
See you in a month for the actual wrap-up!
Read Full Post »
My proposal for this year’s Google Summer of Code (GSoC) has been accepted!
Also this year I will be working on Apache Pig.
Last year I worked on the backend and on improving performance. This year instead I will work on the front end and on improving usability. I will implement a couple of “syntactic sugar” features for Pig/Latin.
- Variable argument for SAMPLE and LIMIT. (PIG-1926)
Currently, SAMPLE and LIMIT only take a constant argument. It would be better to be able to use a variable (scalar) in the place of a constant.
- Default SPLIT destination. (PIG-1904)
SPLIT partitions a relation into two or more relations.
It would be useful to have a default destination for tuples that are not assigned to any other relation, in a fashion similar to a switch/case/default statement.
These features are simple but quite useful. My proposal outlines some interesting use cases.
This year I will be mentored by Thejas Nair. I am very happy to be able to contribute again to this very interesting open source project.
It’s a pity I didn’t start GSoCing before and this will be my last year (blame my memory, on my first year as a PhD student I missed the deadline by 3 days…).
Read Full Post »