I finally caught up on my reading over the holidays – funny what 12 hours of captivity in coach will do – and I highly recommend the November 6, 2010 special report from The Economist on Smart Systems.
The major monster data feeds here are smart phones, sensors and meters. We are talking a lot of data here: zettabytes, which are billions of terabytes.
As an implementation consultant, my first thought is “how do I get this data into a data warehouse?” As usual, the bottleneck is the ETL/ELT process. Note: if the bottleneck is the RDBMS, then you are using the wrong product. :-)
Per the article, IBM research’s approach is “a collection of specialized chips, each tailor-made to analyze data from a certain type of sensor.” Makes sense: centralized ETL servers are not going to be able to filter and transform this amount of data, keeping the resulting data size within the current petabyte limitations for analysis. Throwing lots of CPU power distributed across dedicated chips at the problem would seem like a good approach. And, as the sensor chips require less and less power, enabling power sourcing from non-wired sources such as radio waves, independent renewable power sources, etc., building the filtering chip into the sensor chip should become the default.
Another nice thought from the article is that Twitter is in itself “a kind of collective filter that continuously sorts the content published on the web.” All in all, more filtered unstructured data in addition to the sensor data to fill the data warehouse coffers. And how about all the real-time location and purchase based smart phone data?
Zettabytes? How’s that capacity plan coming along? :-)