Data Warehouse Disruption: Conquering the Long Tail and the Black Swan

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

In this blog we ask the question, “How do the Long Tail concept and Black Swan events relate to and impact the Enterprise Data Warehouse and Active Enterprise Intelligence?” This blog will be a discussion platform for the Data Warehouse disruptions caused by these concepts and how to accommodate them.

To recap, Chris Anderson published a magazine article in Wired magazine in October 2004 titled “The Long Tail” (http://www.wired.com/wired/archive/12.10/tail.html), which has since been expanded into a book published in 2006. The magazine article addressed the phenomenon of online media and entertainment retailers starting to change the dynamics of product mix versus prior experience by brick and mortar retailers.

Most retailing product mix sales statistics have followed the 80/20 rule. That is, 80 percent of sales come from 20 percent on the products available for sale. Mr. Anderson refers to this as the “hit driven” economics of physical distribution channels: there is only so much shelf space in a store for books and CDs to be displayed and stored; there are only so many theaters for display of the thousands of movies that are created each year. Economics dictate that products distributed must pay for the rent of the physical space and distributions channels allocated to those goods.

Online retailers do not have this restriction in the number of titles that can be offered. Media download offers have virtually no cost for the next title in inventory; and physical goods stored either in centralized physical warehouses to serve an entire country or goods available for distribution stored at the manufacturer or third party warehouse, have very low costs for the next title offered.

Once the online retailers started to offer a virtually unlimited number of titles, Mr. Anderson’s investigations found a curious fact: that almost all titles offered sell at least once a month. That is, a larger percentage of sales come from goods not offered in physical stores than might be expected. An example of book sales of online retailers versus brick and mortar retailers showed that 50 percent of online retailer’s sales come from books not stocked by their largest B&M competitors. This is the “Long Tail” of the distribution curve for the products offered by an enterprise mapped against total sales.

Mr. Anderson’s subsequent book and blog have expanded this observation beyond media offerings into virtually all industries that have candidate products on three principles as listed by Mr. Anderson:

  1. Easy to make the stuff: electronic media, including software and data;
  2. Easy to distribute the stuff (lower the cost of consumption);
  3. Easy to connect (find).

Most importantly, it can be anticipated that candidate products will expand in the future as physical production of goods on demand, whether it be printing books in stores as one waits or manufacturing a car based on a customer specific request, becomes more widespread. And this also impacts and is impacted by the speed and bandwidth of the physical distribution channels for goods: as the distribution channels become faster and cheaper, the list of candidate products grows.

Subsequent to this, Nassam Taleb introduced the concept of the “Black Swan” in his book of the same title. Taleb’s definition is, “A black swan is a large-impact, hard-to-predict, and rare event beyond the realm of normal expectations.” This is not necessarily bad: the rise of the internet and PCs can be considered Black Swan events. But we normally associate Black Swans with unanticipated shocks to the system, such as the global financial meltdown starting in 2007. This concept is expanded at Nassam Taleb's home page.

For the Long Tail, the question here is, “What is the effect of this growing new sales and distribution economic model on the EDW?” Is it business as usual? Or are there new demands on the EDW in terms of space, processing power and active monitoring of the enterprise? And what are the effects on space and processing needed to support the Long Tail product offerings related to:

  • History
  • Detail
  • Unstructured data
  • Social Media
  • Multiple, obscure data sources
  • Fraud

For the Black Swan, what are the effects on configuration and disaster recovery planning for the EDW to anticipate:

  • Operational flexibility
  • Spikes in volume: both queries and feeds
  • Unavailability of resources
  • Broadened history needed to adapt

And what is the impact of these concepts on Active Enterprise Intelligence (AEI)?

Over the months I will expand on these concepts with customer examples, discussion of new articles of interest, news items, etc.  And I would like to hear your experiences and thoughts on these subjects.