SQL MR: Tweet_Streamer

Aster Field Strong
Teradata Employee

THIS IS A BETA CUSTOM SQL MR FUNCTION AND IS NOT SUPPORTED BY TERADATA ENGINEERING, CLIENT SUPPORT, OR THE FIELD.

The Tweet_Streamer is created to directly stream live Twitter events into Teradata Aster database. This SQL-MR will download tweets based on specified number of tweets, keywords and language. The idea is to stage the twitter feeds of JSON format into a payload table. Since the twitter data model may change due to at any time relevant business needs, it is best to acquire the complete JSON data and then later be parsed by using another SQL-MR JsonDataExtractor(), a configurable utility that can extract specific nodes and values of a JSON structure.

The SQL-MR is programmed to use an existing table with an auto-created id and a payload column of varchar type.

The main benefit of having this custom functionis that we can have direct access to live twitter feeds in Aster instead of using other external applications like Flume, HDFS. Execution can also be monitored via AMC. This functionality can also work in conjunction with an existing SQL-MR (load_tweets()) that fetches historical tweets for limited number of attributes.

Usage

Syntax

SELECT * FROM Tweet_Streamer(

    ON (SELECT 1) PARTITION BY 1

    [TOTAL_TWEET_COUNT(<integer>)]

    [BATCH_SIZE(<integer>)]

    CONSUMER_KEY('<consumer_key>)

    CONSUMER_SECRET(‘<consumer_secret>’)

    ACCESS_TOKEN('<access_token>)

    ACCESS_TOKEN_SECRET('<access_token_secret>’)

    OUTPUT_TABLE(‘<table>’)

    OUTPUT_TABLE_COL('<col>')

    [LANGUAGE('<languagei.e. en>')]

    KEYWORDS('<keyword1>',‘<keyword2>’, …, ‘<keyword400>’);

Create Twitter Developer App

1. Go to dev.twitter.com

2. At the bottom of the page, under the Tools, click on ‘Manage Your Apps’

3. Click on ‘Create New App’

4. Provide the details for the following

Name: your_app_name

Description: you_app_description

Website: any_valid_website, i.e. https://www.google.com

Callback URL: any_valid_website, i.e. https://www.google.com

5. Check ‘Agree’ on Developer Rules of the Road

6. Click on ‘Create your Twitter application’

7. Go to tab 'Keys and Access Tokens'.

8. At the bottom of the next page, click on ‘Create your access token’. Refresh the page if the update did not reflect

9. Congratulations you have created your consumer and access tokens.

Sample Usage

SELECT * FROM Tweet_Streamer(

    ON (SELECT 1) PARTITION BY 1

    TOTAL_TWEET_COUNT(100000)

    BATCH_SIZE(200)

    CONSUMER_KEY('LHZY9ntmE11HGcgXGu9c4DLVr')

    CONSUMER_SECRET('w89uXXvyNEWnVOh4ePYYnMjeM9fG53FNJfWOxTXWF5TLmxe5QU')

    ACCESS_TOKEN('371407254-gp9zmtYawiajRlrSa83nFzuwCU0b7mtkZo9Qlexv')

    ACCESS_TOKEN_SECRET('1XJku1VLzFWfTdk1i6ApTotiqYIt5xypAmFvWXjwV2Od7')

    OUTPUT_TABLE('twitterpool')

    OUTPUT_TABLE_COL('json')

    LANGUAGE('en')

    KEYWORDS(':)')

);

CREATE TABLE TWITTERPOOL

(

  ID BIGSERIAL LOCAL

  ,JSON VARCHAR ) DISTRIBUTE BY HASH(JSON);

Argument Semantics

i. ON (SELECT 1) PARTITION BY 1 [required]

  -   A dummy input for syntactical reasons and to make sure that the function runs on a single vworker.

                                            

ii. TOTAL_TWEET_COUNT

  -   The total number of tweets intended to be downloaded/streamed into the target Aster table. If not specified this is defaulted to 1000 tweets.

                                              

iii.     BATCH_SIZE

  -   The batch size controls the number of records to be inserted at a time. If not specified, this is defaulted to 10.

                                             

iv. CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET [required]

  -   These are required parameters that will be used to authenticate with Twitter API.

  -   See instructions on "Create Twitter Developer App" below on how to acquire these credentials.

                                              

v. OUTPUT_TABLE [required]

  -   The output table attribute indicates which table the JSON formatted tweets will be downloaded to.

                                       

vi. OUTPUT_TABLE_COL[required]

  -   The output table column attribute indicates which varchar column the JSON formatted tweets will be downloaded to.

                                           

vii.    LANGUAGE

  -   The language is an optional attribute that can specify the language of the tweet. If not specified all tweets without language preference will be downloaded.

  - Refer to http://msdn.microsoft.com/en-us/library/ms533052(v=vs.85).aspx for list of language codes.

                                    

viii.   KEYWORDS [required]

  -   Accepts a comma-delimited single quoted list of keywords. It can accept up to 400 keywords.

  - Refer to https://dev.twitter.com/rest/public/search, under 'Query operators' section for keyword operator behaviors.