With the ubiquity of social media, how important is it for executives, product managers, marketers, and data scientists to have their ears tuned in to what their customers are saying on social media? Would these brand strategists be able to quickly uncover: new features customers are demanding, reoccurring problems with products, whether customers have a negative or positive sentiment about a particular product, and act quickly upon these insights to improve customer experience? With over 300 million tweets per day, Twitter provides a powerful opportunity to gain rapid and valuable insights into their customers'.
Being able to uncover new trends and act upon them can often seem difficult to achieve because the lack of data science resources and the 3 V's of Big Data, Volume, Variety, and Velocity. Teradata Aster provides a plug and play solution that deals with the 3 V's and adds an additional V to the equation, Value. Using Aster a new equation emerges, 3V + V = 4V, or 4 minutes needed to begin to uncover actionable insights.
One of the most exciting and fun features of Aster is the rich text analytic functions that it includes, wrapped in an easy to use SQL interface. A few of the functions that I will demonstrate today are: Text Parser, NGrams, and Load Tweets(Uses The Twitter Search API). I am assuming that you have access to Aster, Aster Express can be downloaded here https://aster-community.teradata.com/docs/DOC-1542 if you don't already have Aster access, and create the Twitter API Consumer Keys and Tokens, which can be done here https://dev.twitter.com/.
Quick Load Tweets using the Twitter Search API
Remove Stop Words, Parse Tweets
Find common word pairs or ngrams
Below is the SQL and SQL-MR Syntax that I used to quickly uncover insights about ESPN and Sports Center(Remember that Twitter API Only Searches 14 days), and did it in under 3 minutes.
create table espn.sportscenter_tweets1 distribute by hash(id) as select * from load_tweets ( on (SELECT 1) consumer_key('yourkeyhere') consumer_secret('yoursecrethere') token('yourtokenhere') token_secret('yourtokensecrethere') search_term('Sports Center') search_type('recent') since('2016-12-26') until('2017-01-06') );
2 Minute Text Parser
create table espn.text_parser_out distribute by hash(id) as select * from Text_Parser(ON espn.espn_tweets1 TEXT_COLUMN('tweet') CASE_INSENSITIVE('true') STEMMING('false') PUNCTUATION('\[.,?\!\]') LIST_POSITIONS('false') REMOVE_STOP_WORDS('true') OUTPUT_BY_WORD('true') ACCUMULATE('id', 'tweet')) ORDER BY frequency desc;
Finding Most Frequent Words create table espn.out_words distribute by replication as select token, sum(frequency) as frequency from espn.text_parser_out group by token order by frequency desc limit 45;
3 Minute NGram
create table espn.tweets_ngram distribute by replication as select ngram, n, frequency from nGram( on espn.espn_tweets1 TEXT_COLUMN('tweet') DELIMITER(' ') GRAMS(3) OVERLAPPING('true') CASE_INSENSITIVE('true') PUNCTUATION('\[.,?\!\]') RESET('\[.,?\!\]') ACCUMULATE('id')) ORDER BY ngram;
Finding Most Frequent Word Pairs
create table espn.out_ngrams distribute by replication as select ngram, sum(frequency) as frequency from espn.tweets_ngram group by ngram order by frequency desc limit 45;
stepping down as
chris berman is
berman is stepping
is stepping down
be the same
espn nfl host
sundays will never
host and sundays
down as espn
oklahoma rb joe
Between both the Frequent Words and Word Pairs we can see that ESPN Legendary Host Chris Burman stepping down was a big Twitter topic. Now find your insights and share below.