When Discovery becomes Repeatable

Learn Data Science
Teradata Employee

When Discovery becomes Repeatable:     Given a real world scenario:  A team of data scientists worked with the business to discover patterns of behavior related to churn along one of its subscription based product lines.  They were given eight weeks to show if they could do it or not.  If not, on to the next problem or the drawing board.

They collected data from three sources(data acquisition), applied some transformations to the data in preparation for analytics (data preparation), they built a churn prediction model using analytics (analytics), and then present their findings to the business through visualizations and churning customer lists(Visualization and Information Delivery).  They test the model for several months and sure enough by making some business adjustments to service center operations and other customer interfacing systems they save the organization two million dollars by retaining high quality customers.   The system was build using the standard ‘Discovery Process.’  They were so successful that they were then tasked to turn over requirements to make that operation repeatable.

Let’s frame this activity properly.  Someone had an idea, what if we could bring together a set of customer information from a variety of sources and channels and identify a list of customers that could be on the path to churn.  What we are not doing is: being burdened by process, project approvals, paperwork, and standards and procedures.  We also have a discovery environment in place to support massive and rapid data collection and analytical application development.  We aren’t concerned with project management, timelines, standards, naming conventions and other internal teams.  The leadership of the organization is behind them because they hope to raise revenue and lower costs by retaining customers.  So cooperation from data source owners is not a problem.  Leadership is also stressing that the business work with the data team to understand how to operationalize findings. 

Conclusion:  The organization is behind the team, they have the equipment, and commodity based data science talent around ANSI SQL.  The business is willing to share data and operationalize findings.  They are successful and then asked to make it repeatable, reliable, and it is expected to produce outputs at the end of every month.  This is no longer an ad-hoc project but is now a repeatable project, with deadlines and deliverables, and is now business critical.  It will become production.

Now that this is production we need to wrap all the traditional software project methodologies, standards, practices, and other Information Technology governance practices.

We also have to consider investing in a new environment for ‘Product Discovery’ or other options.  As mentioned in the previous section in this document, production environments are not best suited for ad-hoc activities like ones supported in a discovery environment.  Discovery is just one operationalized discovery project that will be implemented so it might make sense for now to add capacity instead of investing in a new environment.