Creating a Custom Sentiment Dictionary in Aster's ExtractSentiment Function

Learn Aster
Teradata Employee

Special Thanks to Mike Riordan

Here's an example of creating a custom sentiment dictionary/lexicon for use with Aster's ExtractSentiment function.

The first minor challenge is to create a 'baseline' table using the Aster default words.  I did that with a simple "\download" command.

(Update: you can mix the internal lexicon file and a custom table.  Here's the syntax.  note, there looks like a bug if you get the syntax wrong, under the covers it leaves an open transaction.

I still like creating a default lexicon/dictionary table.  Much more intuitive and easy to use for customers.  Just knowing this underlying file name is unfriendly)

select * from ExtractSentiment
(
    on txt.reviews partition by any
    on txt.my_dictionary  as dict dimension
    model('dictionary:default_sentiment_lexicon.txt')
    text_column('review')
    level('sentence')
    accumulate('id')
)
;

The lexicon/dictionary file is attached here.

-- Create a custom Dictionary for Sentiment Analysis

-- Step #1 - load the Aster default dictionary as a starting foundation


create table default_dictionary
(
    word     varchar,
    opinion  int
)
distribute by replication
compress low
;

-- now load the text file using ncluster_loader

--ncluster_loader -d poc -W default_dictionary default_sentiment.txt

-- take a look at some records

select * from default_dictionary limit 1000;

-- Step #2 - create test data

create schema txt;

create table txt.reviews
(
    id      int,
    review  varchar
)
distribute by hash(id)
compress low
;

insert into txt.reviews values(1,'I love my new phone');
insert into txt.reviews values(2,'My service coverage is much better than it was last year');
insert into txt.reviews values(3,'My phone call quality seems to be getting worse');
insert into txt.reviews values(4,'Just got my new phone :\)');

-- take a look at the records
select * from txt.reviews;

-- Sentiment #1 - using built-in dictionary

-- notice that it doesn't recognize the smile

select * from ExtractSentiment

(
    on txt.reviews
    text_column('review')
    level('sentence')
    accumulate('id')
)
;

-- create a custom dictionary using the Aster default as the initial records
create table txt.my_dictionary
    distribute by replication
    compress low
as
select * from default_dictionary
;
select count(*) from txt.my_dictionary;
--6,818

insert into txt.my_dictionary values(':\)','1');
select * from txt.my_dictionary limit 100;

-- Sentiment #2 - use my_dictionary


-- notice that the smile is now recognized as positive along with the other words.

select * from ExtractSentiment
(
    on txt.reviews partition by any
    on txt.my_dictionary  as dict dimension
    text_column('review')
    level('sentence')
    accumulate('id')
)
;