Figure 4 shows the overall CD4ML flow for this specific case. The data scientists train the model using data from the marketplace — such as car specs, asking price and actual sales price. The model then predicts a price based on the car model, age, mileage, engine type, equipment, etc.
Before training a model, there’s a lot of data cleanup work to be done: detecting outliers, wrong listings or dirty data. This is the first quality gate to be automated — is there enough good data to even provide a prediction model for a certain car model?
Once the trained model can make sufficiently accurate price estimates, it’s exported as a productionizable artifact, — a JAR or a pickle file. This is the second quality gate: is the model’s error rate acceptable?
This prediction model is then transformed into a format matching the target platform, then packaged, wrapped and integrated into a deployable artifact — a prediction service JAR containing a web server or a container image which can be readily deployed into a production environment. This deployment artifact is now tested again, this time in an end-to-end fashion: is it still producing the same results as the original, non-integrated, prediction model? Does it behave correctly in a production environment, for instance, does it adhere to contracts specified by other consuming services? This is the third quality gate.
If all of those three quality gates succeed, a new re-trained price prediction service is deployed and released. Importantly, all of those steps should be automated so that re-training to reflect the latest market changes happens without manual intervention as long as all quality gates are satisfied.
Finally, the live price prediction is continuously monitored: how do the sellers react to the price recommendations? How much is the listing price deviating from the suggestion? How close is the price prediction to the final buying price of the respective vehicle? Is the overall conversion and user experience being impacted, for instance by rising complaints or direct positive feedback? In some cases, it makes sense to deploy the new model next to the old version to compare their performance. All of this new data then informs the next iteration of training the prediction model, either directly through new data from cars that were sold or by tweaking the model’s hyperparameters based on user feedback, which closes the Continuous Intelligence cycle.
Opportunities of CD4ML and the road ahead
Adopting Continuous Delivery for Machine Learning creates new opportunities to become an Intelligent Enterprise. By automating the end-to-end process from experimentation to deployment to monitoring in production, CD4ML becomes a strategic enabler to the business. It creates a technological capability that yields a competitive advantage. It allows your organization to incorporate learning and feedback into the process, towards a path of continuous improvement.
This approach also breaks down the silos between different teams and skill sets, shifting towards a cross-functional and collaborative structure to deliver value. It allows you to rethink your organizational structures and technology landscape to create teams and systems aligned to business outcomes. In subsequent articles in the series, we’ll explore how to bring product thinking into the data and machine learning world, as well as the importance of creating a culture that supports Continuous Intelligence.
Another key opportunity to implement CD4ML successfully is to apply platform thinking at the data infrastructure level. This enables teams to quickly build and release new machine learning and insight products without having to reinvent or duplicate efforts to build common components from scratch. We’ll dedicate an entire article to the technical components, tools, techniques, and automation infrastructure that can help you to implement CD4ML.
Finally, leveraging automation and open standards, CD4ML can provide the means to build a robust data and architecture governance process within the organization. It allows introducing processes to check fairness, bias, compliance, or other quality attributes within your models on their path to production. Like Continuous Delivery for software development, CD4ML allows you to manage the risks of releasing changes to production at speed, in a safe and reliable fashion.
All in all, Continuous Delivery for Machine Learning moves the development of such applications from proof-of-concept programming to professional state-of-the-art software engineering.
Read Part 1 of this series to explore models of enterprise intelligence: to understand how businesses use data to derive insights today, and how they can improve.
Part 2 delves deeper into patterns of enterprise intelligence and identifies the points of friction and opportunities for improvement in the Continuous Intelligence cycle.
This article was published on July 5, 2019