Data Science - What makes the Grade ?

Learn Data Science
Teradata Employee

Data Science is a challenging profession. As we have heard from the many blogs, it is not for the faint-hearted, but it also doesn't mean one needs to have a cool graduate level abstract math degree.

Often data scientists that are hired in organizations are tasked with solving business problems, often end up coding endlessly, without generating value quickly. It goes like this. IT does this checklist and hands off, stuff to the data scientist.

  • Python & R Version Check
  • Connectivity working
  • Spark Cluster connectivity is working with 3rd party libraries.
  • HDFS has one year of data loaded
  • Aster Database is running in the enterprise with connectors configured for DL.

The business then drops the problem like 'Can you guys find the churn problem in a week. I want to know why people are signing up and not using our website. Find me something that leads to a prescriptive action'.

The data scientist goes to work, parse through the logs, extract and creates features spends a month on building some models and gives you a confusion matrix/ROC/AUC plot at the end. Business glazes over that and asks - "So what are you telling me here - that your churn model works ?'. 

Data Scientist: 'Yeah, we are getting 95%+ accuracy in churn prediction. We are ready to deploy.'

Business: 'So what's the root cause of the churn prediction ?'

Data Scientist: 'Well the significant variables are related to average time spent on certain pages like FAQ, Terms of Service, etc.,'

Business: 'That is a great insight. Tks for finding that feature. However, why are they going there to FAQ and Terms of Service in the first place ?'

Data Scientist spends another few weeks to answer that question only to be lost in identifying the core problem/root cause. The project goes into a vortex with similar exchanges back and forth. Something is missing. Business moves on.

© Can Stock Photo / AlphaBaby

Thinking like a business person:

Most folks in the community would agree that it is not just the tool *alone* that solves the business problem ultimately given the choices we have today. It is understanding the business needs and data scientist's ability to put themselves in business's shoes.

You could blame on data, loading speed, time taken to iterate on discovery, the stability of the cluster, etc., but if a data scientist does not get the business problem, weeks can go by with no useful results.

Why is this gap so common?

Businesses rely on data scientists as their eyes and ears for the underlying data. Data Scientists by definition are also tasked to speak the language of business. However, it's quite possible the data scientists are spending a lot of their time with new tools and methods (shiny GitHub objects or the Algorithm of the day). Business may not know what to ask of the data, and Data Scientists often may not even know where to look!

If only the Data Scientist knew to ask relevant questions to tease out the use case ...

How can this be resolved?

  • Educating data scientists on business problems.
  • Investing in tools where data scientists can create portals that business can play and drive and complain about gaps. Creating a clear and value added interface to the data will only bring better problems to the data science teams.
  • Standardize on an Interface that can provide a workable Abstraction for both the Data Scientist and Business: This is the single biggest challenge when we tend to use technologies that require extensive coding as mean to get to the art of analytics aka communication to the business. My personal experience has been using less coding, more visuals and a way to give the knobs to the firm to turn and question the data.
Can you socialize the analytics and insights continously to the business ?
  • Talk to vendors who focus on business outcome and have done this many times over - all the way from managing a data lake to delivering models real time and guide you in talking to the business.

© Can Stock Photo / sn4ke

In other words, if your data scientist cannot create a continuous story or narrative or stream with the data that business understands, businesses will find a way to rationalize what is working. That is bad for data science and a losing proposition for the business.

1 Comment

I think you hit the mark right on!  IMHO we should stop using the term "data scientist".  The most effective folks in industry with the data scientist title are really business analysts.  What they do, that is use data to understand and improve your business, is based on science, but it is not science.  Further, again IMHO, a "scientist" is someone with a Ph.D. in a mathematical discipline... my friends in biology and computer "science" will hate me for this.