By Claudia Imhoff, Guest Contributor
I get asked all the time, is the data warehouse dead?
Not only is this relatively old idea not dead, it’s more robust than ever and it plays an important role with new technology in delivering better analytics.
Is the Enterprise Data Warehouse dead? No, it’s alive and well and it will be around for many years to come. However, its purpose has changed. There’s a difference in why we build the enterprise data warehouse today because now it is part of an extended data warehouse architecture.
There are three disruptions to the old way of doing things and these are what’s behind the new extended data warehouse architecture. One is the advent of new technologies such as Hadoop, NoSQL and appliances. Then, there’s the pressure to reduce costs through open source and less expensive ways of doing things. Finally, there is the advent of big data that allows us to get business insights that we’ve never had before.
These drivers of disruption have opened the doors to new technologies for enhanced data management capabilities, new deployment options such as the cloud and on premises (most companies are doing a little bit of both), and finally, the increased adoption of advanced analytics.
This disruption makes for an exciting time but all of these drivers mean that our traditional architectures have to adapt and expand. The extended data warehouse is just such a new architecture that encompasses three big analytic components:
This is the traditional location for fully integrated, fully cleaned up data and most activities happen in batch mode. What is interesting about the enterprise data warehouse, what Teradata calls its Integrated Data Warehouse, is that it is the production analytic environment for reports and analyses that run on a regular schedule. It’s where the trusted data is, the consistent answers to everything. It is used for production reporting and historical comparisons, analyses on life-time value of customers and key performance indicators.
Investigative Computing Platform
The investigative computing platform has been around for a relatively short time. This is where innovative new technologies like Hadoop, in-memory, columnar storage, data compression and appliances shine. The difference between this and the enterprise data warehouse is that the investigative computing platform, like the name suggests, is the experimental area. It is the sandbox where we can play and do unplanned and general investigations. While activities in the Enterprise Data Warehouse are planned and known, the activities in the investigative computing platform are mostly unknown. It is used for complex, difficult queries where you need to explore the data to discover what it can tell you, rather than to provide answers to questions you already know. Examples of unplanned queries are: Did this ever happen? How often did it happen? What are the correlations to our marketplace?
Operational Analytics is the new kid on the block and has introduced us to an interesting world of real-time analytics on real-time data. The Enterprise Data Warehouse and the Investigative Computing Platform do not run on real-time data – there is some latency involved in the capturing of the data for these environments. Operational analytics gives us the ability to analyze the data as it is streaming in, even before it is stored in some cases. This is where embedded business intelligence services can be called on along with a real-time analysis using an analytic engine.
When you bring together both the old and the new technologies, you gain additional analytic opportunities beyond what you can realize with any one of them separately. For instance, when you bring together the Enterprise Data Warehouse and Operational Analytics, it’s possible to do things like stock trading analysis, risk analysis, and discovery of the correlation between seemingly unrelated data streams, like things you never been able to do in the past such as see a link between weather and the success of a marketing campaign. This is where you can use a fraud model against streaming transactions to determine whether a transaction has the characteristics of fraudulent transactions. If it does, then it can be dealt with very quickly.
We can also create next best offers for customer service reps to suggest to customers while they’re on the telephone. This more complex process means that all three environments must contribute analytics to support the CSR. From the Enterprise Data Warehouse, we derive customers’ purchasing history and combine it with they’re recent history analyzed in the investigative computing platform. This combined information then combines with the operational analytics. The ultimate outcome is a detailed, targeted “next best offer” which is presented to the representative for use with the customer.
These three analytics components have to work together because when they don’t, you not only have silos, you have a chaotic environment. This is where two new technologies really are showing their value to companies, data virtualization and data visualization.
Simply put, data virtualization brings the data together in a virtual fashion and presents them as if they were physically together. In fact they’re still residing in their original data stores. The key is to understand what you can virtualize and to recognize this technology’s limitations. You need a hook between the multiple sources of data and analytics to ensure their integration. If the data requires significant amounts of integration or quality processing, then data virtualization is not your tool; you will need to use data consolidation or ETL processing instead.
Data visualization is another relatively new technology that is frequently employed now as companies begin collecting and analyzing more data and more types of data for data discovery. We need to be a little bit careful that we don’t get so enamored with these beautiful visualizations that we forget that they can be difficult to understand by other, less mathematically or technically oriented people. They’re complex, there’s a lot going on, and understanding what these graphics mean to the company can be difficult to discern. If we’re going to create these visualizations then we need to also create the interpretations for all levels of employees so they can understand what the nodes and edges and colors mean.
The future is incredibly bright for our businesses today. Every time I think it is slowing down, there’s a new wave of technologies and capabilities coming at us. We now have data science, which has given credibility to a whole industry of bright people called data scientists. New technologies continue to spring up to support more personal with interactive and real-time intelligence and communications. While at Teradata 2015 PARTNERS, I had the pleasure of taking part in a live streaming interview via the Periscope app, something that no one had even heard of a year ago, where I talked about this same topic of merging the old and new technologies.
Bio: Claudia Imhoff is the president of Intelligent Solutions and founder of the Boulder Business Intelligence Brain Trust (BBBT). Please follow her on Twitter.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.