Art of Analytics: Crown of Thorns

Learn Data Science
Teradata Employee

CrownOfThorns-KailashPurang-Web-650.png

 

About the Insights

This is the second visualization of Kailash Purang's Two Part CIA Report series. It demonstrates the ability of advanced analytics to rapidly distill extremely complex documents into easily consumed visualizations, free of human bias. It should be viewed after the reader has seen Part One, 'Terror Report'.

 

In this second visualization, Kailash analyzes the same data as the 'Terror Report' word cloud, with more sophisticated text and graph analysis to reveal significantly more of the storyline and meaning of the report itself. Each dot (or node) is a significant word appearing in the report, larger nodes are words occurring more often. The lines (or edges) link words to the other words they appeared with. The darker thicker edges link words that occur together with higher frequency. Now we can see the main story lines and subjects in the word clusters and their linkage to each other. If you start at the top left hand corner, you see the name Abu Zubaydah amongst words like waterboarding, rectal, mother, brutal and harm. Edges link to enhanced interrogation techniques, CIA and detainees and smaller word groups like oversight and actively avoided. It shows the treatment the still in captive Abu Zubaydah has received and we can trace the edges through to see the surrounding issues of how and why it was allowed to happen.

 

By studying the visualization the reader can now quickly absorb the key details and interplay between all the subjects covered by this very complex report, free of human bias and filtering.

 

About the Analytics

This visualization uses Teradata Aster's text mining capabilities on the 525 page, December 9 2014, publicly released excerpt of the Committee Study of the Central Intelligence Agency's Detention and Interrogation Program compiled by the U.S Senate Select Committee on Intelligence.

 

Term frequency–inverse document frequency was used to isolate the critical words and word groups within the report. The algorithm compares how often a word occurs in a piece of text, relative to how often it occurs in the whole body of text. A word that is important to a specific piece of text will occur relatively frequently in that piece as compared to the whole body.

 

The detailed connections data linking the words was acquired by text mining using native Aster text mining functions such as nGram. The output was used to create an underlying node-edge table. This was visualized as a graph using Aster Lens emphasizing the connections. This allows clear clusters of words to occur representing individual ideas.

 

About the Analyst

Kailash is the lead Data Scientist for Teradata in Singapore. He also works across South East Asia and most notably in Indonesia, supporting the leading banking and communication industry clients Teradata serves in the region.

 

Kailash holds a Bachelor of Economics and Statistics as well as a Masters in Economics from the National University of Singapore. He also holds a Bachelor of Management from University Of London. He has worked in the field of analytics for 15 years across various industries.

 

Despite having ‘sold his soul’ to join the commercial world, he still believes that the aim of all this learning and technology is to make people’s life easier and more fun. To help introduce analytics in a fun ‘tear-less’ way, he works in his spare time on creating visualizations that show how everybody can benefit from simple analytic applications.

 

As a Data Scientist for Teradata, he strives to make his clients realize the full potential of 'Big Data' so that their customers can benefit via better services and offerings.