Twitter: 0 to Insights in 4 Minutes with Teradata Aster

Learn Data Science
Not applicable

With the ubiquity of social media, how important is it for executives, product managers, marketers, and data scientists to have their ears tuned in to what their customers are saying on social media? Would these brand strategists be able to quickly uncover: new features customers are demanding, reoccurring problems with products, whether customers have a negative or positive sentiment about a particular product, and act quickly upon these insights to improve customer experience?  With over 300 million tweets per day, Twitter provides a powerful opportunity to gain rapid and valuable insights into their customers'.

Being able to uncover new trends and act upon them can often seem difficult to achieve because the lack of data science resources and the 3 V's of Big Data, Volume, Variety, and Velocity. Teradata Aster provides a plug and play solution that deals with the 3 V's and adds an additional V to the equation, Value. Using Aster a new equation emerges, 3V + V = 4V, or 4 minutes needed to begin to uncover actionable insights.

One of the most exciting and fun features of Aster is the rich text analytic functions that it includes, wrapped in an easy to use SQL interface. A few of the functions that I will demonstrate today are: Text Parser, NGrams, and Load Tweets(Uses The Twitter Search API). I am assuming that you have access to Aster, Aster Express can be downloaded here https://aster-community.teradata.com/docs/DOC-1542 if you don't already have Aster access, and create the Twitter API Consumer Keys and Tokens, which can be done here https://dev.twitter.com/.

MinuteSQL-MR FunctionPurpose
1Load_TweetsQuick Load Tweets using the Twitter Search API
2Text_ParserRemove Stop Words, Parse Tweets
3NgramsFind common word pairs or ngrams

 

Below is the SQL and SQL-MR Syntax that I used to quickly uncover insights about ESPN and Sports Center(Remember that Twitter API Only Searches 14 days), and did it in under 3 minutes.

 

1 Minute

create table espn.sportscenter_tweets1 distribute by hash(id)
as
select * from load_tweets ( on
(SELECT 1)
consumer_key('yourkeyhere')
consumer_secret('yoursecrethere')
token('yourtokenhere')
token_secret('yourtokensecrethere')
search_term('Sports Center')
search_type('recent')
since('2016-12-26')
until('2017-01-06')
);

2 Minute Text Parser

create table espn.text_parser_out distribute by hash(id)
as
select * from Text_Parser(ON espn.espn_tweets1
TEXT_COLUMN('tweet')
CASE_INSENSITIVE('true')
STEMMING('false')
PUNCTUATION('\[.,?\!\]')
LIST_POSITIONS('false')
REMOVE_STOP_WORDS('true')
OUTPUT_BY_WORD('true')
ACCUMULATE('id', 'tweet'))
ORDER BY frequency desc;

 

Finding Most Frequent Words
create table espn.out_words
distribute by replication
as
select token, sum(frequency) as frequency from espn.text_parser_out
group by token
order by frequency desc
limit 45;

TermFrequency
espn1880
chris960
burman949
nfl940
stepping569
down577
would558
he551
host469

3  Minute NGram

create table espn.tweets_ngram distribute by replication
as
select ngram, n, frequency from nGram(
on espn.espn_tweets1
TEXT_COLUMN('tweet')
DELIMITER(' ')
GRAMS(3)
OVERLAPPING('true')
CASE_INSENSITIVE('true')
PUNCTUATION('\[.,?\!\]')
RESET('\[.,?\!\]')
ACCUMULATE('id'))
ORDER BY ngram;

 

Finding Most Frequent Word Pairs

create table espn.out_ngrams
distribute by replication
as
select ngram, sum(frequency) as frequency from espn.tweets_ngram
group by ngram
order by frequency desc
limit 45;

NGramFrequency
stepping down as520
chris berman is488
berman is stepping460
is stepping down459
be the same409
espn nfl host398
sundays will never394
host and sundays389
down as espn359
oklahoma rb joe348

 

Between both the Frequent Words and Word Pairs we can see that ESPN Legendary Host Chris Burman stepping down was a big Twitter topic. Now find your insights and share below.

Attachment:
Load_Tweets Function

Twitter Search Operators Table