The fable The Blind Men and the Elephant teaches us that looking at something from only one perspective is not conducive to understanding the big picture.
The same is absolutely true for your datasets! In this post we will look at clickstream data through the lens of several genres of analytics in an attempt to fully understand the big picture. This thought experiment is a beginning phase of the Multi-Genre Analytics approach – we are looking at the data set with multiple genres but not yet combining them to solve a specific business problem. Although there are many genres of analytics we will focus on five that are easily visualized and great for providing quick insights about the big picture.
Data as a Table
4/1/09 1:29 PM
4/1/09 1:32 PM
4/1/09 1:34 PM
4/1/09 1:38 PM
4/1/09 3:41 PM
4/1/09 3:43 PM
4/2/09 5:43 AM
4/2/09 5:47 AM
4/2/09 5:50 AM
We are using the data set of roughly 3M rows about customer browsing behavior. Each row has a session, a web page, and a time stamp. By viewing the data as a table we can easily see the type of data we're working with and begin to do some basic business reporting. By using aggregates we can answer simple statistic questions like 'How many sessions were there per month?' and 'What are the most popular web pages?'
Data as a Path
We next view our data as a path to see how customers are navigating the website. In the above picture we have the 100 most common paths in this data set. Some common behaviors quickly jump out
Now that we have an idea of what's going on we could refine the path that we're looking for to gain further insights.
Data as a Tree
Viewing the data as a tree gives us an unaggregated view of the common ways that customers move about the website. Like in path, we see that customers start with either the home page or a view product page. With this view we can then see if there is any differences in behavior later on in the browsing session.
Trees are also great for understanding behavior that isn't linear. Online shopping is a great example since it is common for customers to hold the Ctrl key and open multiple tabs from a search results page. Linearly it would look like the customer viewed an item after viewing another item. Hierarchically we know that those items are being viewed at the same time with customers even jumping back and forth between pages.
Data as a Graph
Graphs allow us to understand the relationships happening in our data space. Here we are looking at how customers move from one page to another or how the various pages are related. From this graph we can see that
We might also look at relationship between customers and items bought to see if there is a trend between the type of customer and the type of item.
Data as Text
Although is this a text-sparse data set there are some insights that are clear here and not when we view the data from another genre. In path and graph we saw that search results was the most prominate page, but when we look at sheer volume view product stands out. This means that there are many customers who are coming directly to a product page and then doing nothing else. We might look at those specific product pages to determine a cause for this. Do they have less of a product description? Is the checkout button not working here? What might be causing this sparse behavior.
We started by knowing very little about a click stream data set. By looking at the data with visualizations from several genres of analytics we were able to get a better understand of how customers are experiencing this specific website. We gained new insights from each type of analytic and by the end were less likely to make poor assumptions about what was going on. This gave us clear questions to investigate further and insights on common business questions like 'How can we improve online sales?'.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.