A long, long time ago I worked for Atari. As the story goes, they knew that they had something special when the first coin operated video game machine they installed broke down. After a panicky call from the bar owner, they found the design flaw: the coin box was jammed full with quarters from the demand. Easy to predict the video game industry was going to be a monster hit? If you were paying attention, “Yes.”
Along the same lines my missed moment for Black Swan prediction also came at Atari. In the ‘80s, the first email system I worked with – prior to the availability of the internet – was IBM’s Professional Office System (PROFS). This ran on expensive IBM mainframe systems (for context, a new mainframe cost $8M at the time). After installing and starting limited use of PROFS only for internal IT design collaboration, we had to shut it down: the usage was so heavy that it consumed an entire mainframe server, which was deemed too high of a cost. Why I didn’t immediately start working for an internet startup company after this happened is another story.
So the question is: “How do you predict a Black Swan?” Nassam Taleb’s definition includes “… hard-to-predict, and rare event beyond the realm of normal expectations.” By definition, one normally can’t predict the event. However, one can try and predict the consequences. Although we can’t predict the events that may bring about another financial meltdown, we can try and insulate ourselves from the effects. For example, Mr. Taleb is starting a hedge fund to profit from hyperinflation. He may not be able to identify the catalyst for another Weimer Republic or whether this is even a probable event, but he can try and anticipate the results and hedge a portion of his wealth.
It’s much the same for insulating the Enterprise Data Warehouse against unpredictable Black Swan events. We may not be able to predict what the external catalyst might be, but we can try and deal with the possible impact. And that impact, in addition to disaster recovery scenarios, can be spikes in data volume and query volume.
To hedge against the spikes, the EDW needs to have proactive controls in place:
• For data feed spikes, alternate designs for data load catch-up built into the ETL/ELT architecture, design and implementation;
• For data storage volume tsunamis, structures and systems in place to quickly free-up space;
• For query storms, throttling designs built-in and running even if current environment is not near query capacity.
The key is to be proactive in your architecture, design and implementation: paying a small cost now to hedge against the possible data and query deluge.