He was in the clear. With only about 100 meters to go in the steeplechase in April 2015, Oregon’s Tanguy Pepiot had a commanding 30 meter lead over Washington’s Meron Simon. Then he began the celebration, waving his arms spurring the crowd to cheer well before crossing the finish line, and finishing a very humiliating second.
Ok, so what does this have to do with data science?
Many organizations are jumping in the data science pool too early, with both feet. They are not considering what foundations should be in place, and what preparations are needed. Just like Tanguy celebrated too early, many organizations are filling advanced analytics teams with costly data scientists thinking that will automatically leapfrog them ahead of their competition in analytic sophistication. Ultimately, these premature investments may cost them the race.
Let me say that I have been doing “data science” for the past 20 years, and I have personally seen analytic initiatives substantially improve the P&L trajectory. You will not find anyone more bullish on the value of data scientists than me. So, I know that data scientists can add huge value… but not always. Your organization must be prepared to enable them to succeed.
Before there is substantial investment in a data science team, there must be some investment in data technology and the organization.
Data Technology Needs:
Data scientists will test your data platform capabilities and data quality as no group has ever done. If you aren’t prepared for this, the data science team and IT will get frustrated, lack productivity, create dysfunction and high turnover will result. For example, data scientists will uncover data quality issues and will simply want these issues fixed. If not communicated effectively, IT might interpret that as an attack on the quality of their work even though that specific data was not relevant to any use cases that would have been checked. No fault is implied, but could be heard. More about this later…
Recommended by Teradata
To be successful, the following technology components should be in place:
Quality data infrastructure… with room to grow
Data scientist will access data, lots of data… and do it often. In a database environment, for example, they will often write very complex queries, with requirements to join complex tables. You need to expect that your database utilization will increase, and possibly explode. So be prepared.
Easy access to raw, granular data
This one gets me in trouble with many seasoned IT professionals as this access will create havoc with unprepared data systems and technology. However, many of the use- cases that are best addressed by data scientist techniques require data at the lowest level. If they are only provided cleansed, aggregated data, their contributions will be limited. It is certainly appropriate to maintain oversight on the activities that data scientists are involved in. However, typical controls need to be loosened to allow discoveries to occur.
A space for data scientists to play
This has been called a “sandbox” or “discovery zone”, but ultimately needs to be an area within an analytics environment that the data scientists can control and use to store and process data. It should be a space that is very efficient at storing large volumes of data and is accessible by their analytical tools. It sometimes even needs to be separated from the other data servers so that the intense processing does not interfere with critical production jobs.
Solid reporting/BI infrastructure:
Data scientists will uncover many new KPI’s and also develop improvements to business processes. In both cases, they will need to make those items visible to the organization. New KPI’s need to be integrated into dashboards and various reporting tools (such as OLAP). Improvements to business processes need to be tested, monitored and measured, which require regular reporting. Without an established reporting infrastructure, these things will be very difficult.
Sabermetrics is analytics applied to baseball, assigning a monetary value to each player based on their projected productivity. It was made famous by the Oakland Athletics in the late 1990’s. With one of the lowest payrolls in MLB, they went to 4 consecutive American League Champion series proving the value of analytics in Baseball. Fast forward to today where the Los Angeles Lakers, who had the worst season in their illustrious history, have a team of analysts. However, their coach Byron Scott (an old school coach who relies on his rich experience) refuses to use their recommendations. This clearly is not a “data driven” organization. Do you think that team of analysts is motivated? Are they effective? Impactful? Because there is so much dysfunction, should the Lakers even have them? These are the issues that every organization must avoid.
The following organizational components should be in place:
High level influential, executive champion
Are all groups open to testing or acting on the data scientist team’s findings? A strong executive champion will ensure that is the case, and also will help create a “data driven” culture replacing decisions made solely on experience. This will ensure that findings and ideas generated by the data scientist team will be utilized by key decision makers.
Good relationship between business, analytics and IT
This is probably the most critical organizational requirement. Unfortunately, in most organizations, things simply don’t get done because the groups don’t work to the same objectives. IT is generally incented to “keep the lights on”, by making sure that mission critical processes work without fail. Business is encouraged to innovate, take risks, finding opportunities to adjust the business approach, squeezing additional revenues. Analytic teams generally follow the business objectives, but create prototypes for data solutions that are not always in synch with the current IT systems and priorities. If these groups all march to these conflicting objectives, any analytical innovation could be stuck in a political abyss.
Willingness to invest development resources to operationalize complex data science solutions
It can be extremely difficult to turn prototyped data science processes into reliable operational processes. Also, it often will be unclear what the business impact will be before testing, which will require a substantial investment in many cases. If the organization is not willing to invest in turning these into operational code, many of the data science team’s efforts will not lead to action and will not generate value. What a waste! There needs to be an up-front commitment to invest and action many of these.
Resource available to ask data questions
Good data scientists will look at data in ways it has never been before. Therefore, they generate frequent, involved questions, and need to be able to get those questions answered quickly. While data dictionaries in theory would be a solid solution, I have never seen a good one built and reliably maintained. So, if one is not available, there needs to be a set of “data experts” who know the data and know how to get answers for what they don’t know. They need to partner with the data scientists, and should respond to questions quickly… in minutes or hours, not days.
Note that not all of the above items are absolutely required. There are many examples of companies with very effective data science teams that are missing one or more of the prior business and technology requirements. However, how much more effective could they be if the friction of the missing pieces was removed? Do not ignore these basics if you truly want to be on a path to win the race.
Originally Published on May 7, 2015 by Teradata Forbes Voice