Stars and Stripes - Christopher Hillman

Learn Data Science

Stars and Stripes - Christopher Hillman

Started ‎04-29-2015 by
Modified ‎04-29-2015 by


Life Sciences Industry‌


Christopher Hillman‌


About the Insights

This data visualization displays the results from a clinical drugs trial undertaken in the US represented as a graph. Drug trials are frequently very complex involving the collection of data over long periods of time.

Trials usually involve several “arms” where patients are split into Cohorts. Each Cohort will have different characteristics, such as the specific sequence of drugs prescribed. The result is a substantial amount of data that relates various trial drugs to a range of observed outcomes. In this visualization we are looking at the linkage between the trial drugs and their negative side effects.

The four star like images in the chart show different visualizations of the same drug trial. Each of the 5 dots (nodes) that form the points of the stars represent a drug or drug variant that was prescribed to patients in cohorts during the trial. The nodes in the center represent an unwanted side effect experienced by a patient. The connections between the various drugs and the side effects are displayed as lines (edges).

We can now easily observe each of the drugs around the outside of the stars and their linkages to the various side effects in the center. Four variations of the same visualization are shown here. Each one filters on different elements to highlight a particular finding. For example, the connection between a certain type of negative side effect and a drug or using colors to highlight the strength of the connection between the 5 drugs tested and the side effects.

About the Analytics

The data collection for this image was rather complex and involved several steps before using the graph software to create the images. First many reports from the website were downloaded using the extraction tools available on the site. This download creates files in XML format for download, the XML needs to be pre-processed before analysis, and this was done using out of the box Teradata Aster MapReduce functions. Further to this Text Mining functions were used to extract the side effect names from reports pertaining to a particular drug. This allowed a node/edge to be built in a standard relational table. From this information a graph was created and various graph metrics calculated. The challenges of data processing include handling outliers and missing values.

Each of the four representations allows a way to see different significant patterns in the data. The color of the edge is influenced by how strong the connection is between drug and side effect. This data is published into the public domain and can be obtained from sources such as and

Version history
Revision #:
1 of 1
Last update:
‎04-29-2015 01:24 PM
Updated by:
Labels (1)