This SQL/MR function generates unique big int ids for rows in an input table. Though the algorithm guarantees the ids to be unique, it doesn't however guarantee to be strictly serial (expect some gaps in the id #s).
Why not ROW_NUMBER() ?
The biggest advantage of ID() vs ROW_NUMBER() OVER (PARTITION BY 1) is that it can generate ids for table rows that scales linearly, because it has no repartitioning operation. ID() uses a truly parallel algorithm. It is especially useful when you load data from a fresh data source and each row needs id tags for Text Analysis, XML or JSON Parsing regardless of the row order. As an example, If you have 12 billion rows in a table that you just loaded and need to id it uniquely, just use ID() !
SELECT * FROM ID(
The output schema and data matches the input table with an additional column called id which is bigint !
CREATE TABLE webclicks_with_id DISTRIBUTE BY HASH(id)