How to Fix the Data Scientist Shortage

Blog
The best minds from Teradata, our partners, and customers blog about whatever takes their fancy.
Teradata Employee

The amount of data that is now available for analysis is growing at an exponential rate.  As the hardware to capture the data becomes cheaper, faster and omnipresent, the amount of data collected for analysis is becoming beyond the scope of the current analytical staffs to examine it all. 

Much of this is the due to evolution of smart phones and, more lately due to cost drops, the internet of things.  “Verizon (with help from ABI Research) estimates that as of 2014 there were 1.2 billion different devices connected to the Internet, and that the number will rise to 5.4 billion by 2020 for an annual growth rate of 28 percent... it was the manufacturing sector that saw the fastest growth in adopting IoT products last year, up more than triple since 2013, according to the report...  Other segments deploying IoT devices at a fast-growing rate included finance and insurance companies (up 128 percent year on year), media and entertainment firms (up 120 percent) and the home security and monitoring businesses (up 89 percent).”

To process all of this data, another bottleneck – I/O capacity – will be entering the capacity curve of Moore’s Law, which states that the capacities of semiconductors will double every two years.  “The capabilities of many digital electronic devices are strongly linked to Moore's law: quality-adjusted microprocessor prices, memory capacity, sensors and even the number and size of pixels in digital cameras. All of these are improving at roughly exponential rates as well.”.  Up to recently, it was only cost effectively feasible to store this vast amount of data on spinning mechanical devices, the I/O capacity of which was not keeping up with the advancement of semiconductors.  But now, as we enter the age of solid-state devices for data storage enhancing the overall disk throughput, I/O capacity will be able to keep up with the growth of data at the same rate as the capacity of the surrounding CPU and network hardware.    

So how will the analytical side of the house keep up with staffing required to get value out of all this data?  I do not believe that the personnel strategies of the past will be able to keep up with the exponential growth in data.  All companies will be competing for the same resources and, unlike the quants on Wall Street, the incremental marginal gains found for each new data scientist is not as visible and, as of result, will cap salaries to a level that will limit supply.

I believe that answer to growing the brain power necessary for the exponential growth in analysis is going to have to be machine learning and artificial intelligence.  For example, in Cyber Security machine learning allows the skilled hunters to off-load the work to the computers.  “With machine learning, tier-three network analysts can offload much of the heavy lifting that helps them distinguish a threat worth pursuing from legitimate activity requiring no additional investigation. By allowing the machine to do this work, tier-three analysts can spend more time pursuing the machine-identified threats. The added benefit is that tier-two analysts learn from the machine decisions that were initialized by the advanced network analysts.”

This can be extended to all industries that are now beginning to justify the costs to capture and store all that data coming in from the world around them.  “Wired” magazine had quite a nice analogy from the past for your elevator discussion:  “The AI industry to Big Data is as petrochemical industry was to crude oil. We have the promise of doing more with the Big-Data crude than to simply burn it.”  You can throw in the famous reference to “plastics,” in the film “The Graduate.”

This is going to have to be the fix for the data scientist shortage in the future:  we can’t expect to create data scientists at an every increasing rate.  Instead, they will become the teachers of the AI.  Now where did we place those Laws of Robotics…?

Other links of interest.:

http://www.datanami.com/2015/03/24/achieve-business-value-with-machine-learning/

http://www.wired.com/2014/03/use-data-tell-future-understanding-machine-learning/

2 Comments
Teradata Employee

I love the comment about the Law of Robotics.... funny, and totally true!

Teradata Employee