The coolest Sql-MR Function you'll see today


The coolest Sql-MR Function you'll see today


I have to walk a fine line between making a plug and sharing some cool news, so Mods please keep me honest here.

This is the story of a quick-and-dirty little sql-mr function that started out as a simple expression evaluator (add, subtract, multiply, divide) but grew, Frankenstein-like into a full-fledged programming language.

Another programming language?  Doesn't the world have enough of them?  What's so special about this one?

To answer in order: Yes, Probably, and THIS programming language runs IN and AS an sql-mr function.  You pass in the program you want to run AT THE COMMAND LINE and it executes the code, reading in records from the sql-mr function's "ON" clause (and yes, it handles multi-input functions) and passing records back to the database.  And it supports JDBC.  As in read from JDBC via a "cursor" varible, update, delete and insert records and execute arbitrary sql on a JDBC connection.

Basically, we've created a database programming and stored procedure language for nCluster.

Here's an example of using the language:

SELECT * FROM pddbl (
ON ( SELECT person_id, ssn, last_name, first_name FROM persons)
# read in person record, proper-case the names and add a "full_name" column
script format_persons {
# set up output record definition
function setup() {
addoutput("full_name character varying")

# read in records, reformat the names, then output
function main() {
# loop over the input records (the "ON" clause)
while(consume(input0)) {
output.full_name=input0.last_name + ", "
output.full_name=output.full_name + input0.first_name

The other cool thing it does is that it can execute programs that have been previously stored in the database using the "\install" command (which is why it's effectively a stored procedure language").  So assuming the text of the above program (everything in the "SCRIPT()" option to the "pddbl" sql-mr function) had been saved as a text file named "format_persons.txt" and that file had been uploaded to the cluster, you could run the above as:

SELECT * FROM pddbl (
ON ( SELECT person_id, ssn, last_name, first_name FROM persons)

There's a LOT more to it (your usual if/then/else, other types of loops, exception handling via try/catch/finally, array, hashmap and stack variables, user-defined functions, and of course the JDBC functionality), but a forum post really isn't the (for lack of a better word) forum for that sort of thing.

If anyone is interested in seeing it in action, send me a PM to my email address (should be listed in my forum profile, but just in case: