Sentiment Analysis from Nike's Breaking2 Project

Learn Data Science
Teradata Employee

At the beginning of May, Nike ran a campaign to break the marathon time. The runners wore Nike gear and ran on a Nike track, in this highly controlled environment the goal was to record a marathon time less than two hours (#Breaking2). The whole event was live streamed on the Nike facebook page. The fastest time that came out of this effort was 2:00:25 ran by Kenyon long distance runner Eliud Kipchoge.

Many are quick to point out that this seems like a big publicity stunt for Nike, especially since the event happened a few weeks before Nike was set to release a new line of running shoes. But do these doubts take away from the incredible result that came from this experiment?

With Aster we can mine the comments that were left on the live facebook videos from the event and look at the sentiment associated with the comments. A data set of 27,219 comments was retrieved for analysis.

The first step in finding sentiment from any social media data set is building a custom sentiment dictionary. For this event, traditionally negative words like, “break”, “insanity”, and “limits” don’t carry negative sentiment, and in most social media cases words like “dope”, “chill”, and “crazy” also don’t indicate negative comments.  Once a base social media dictionary is created it can be reused in other social media use cases.

After the sentiment dictionary is adjusted we can look at the breakdown of sentiment in this comment data set.

The results are not too surprising, most social media comment data sets have very large amount of neutral comments. These comments include things like people tagging their friends in the post and directly asking a question to Nike through their comment.

We can breakdown the sentiment further with Cosine Similarity to find out the kinds of things people talk about in positive and negative comments.


In the above visualization shows comments with positive sentiment in the data set, every comment is a node, and the edges indicate the similarity score between the comments. The labels are summarizations of the topic of the comments, in this case the labels are pretty much verbatim what the comments in the cluster included. The top positive comments are very generic with high similarity scores. There isn’t much information to be gleaned from looking at positive comments.

In the negative comments there are some generic negative clusters like lame and fail, but many of the top comment topics give helpful information. The labels are an overview of the topic of the comments, in the negative case many people had different things to say about various topics rather than the generic comments from the positive sentiment data set. Many of the comments were about the environment in which the race was set up and how this race can’t be counted as an official time; this can be seen in comments that mention pacing is cheating and the wind resistance of the pace car. There are also many comments about the entertainment aspect of the race, many people did not like the announcer in the live stream and thought Kevin Hart being included in this marathon effort was an odd choice.

Though there were a few comments about this being a marketing stunt, it wasn’t an overwhelming theme when examining the negative comments. There was a bigger set of people reacting to those comments than people actually making them (the “haters gonna hate” topic). People were quicker to complain about the logistics of the race rather than the idea of it, which is good news if Nike wants to host another #Breaking2 event.

1 Comment
Teradata Employee

That was an interesting analysis.  I was pulling for him to come in under 2 hours!