ML Ops and the Promise of Machine Learning at Scale


An illustration with the word "ML Ops" in blue 3-D box lettering. ML Ops is a system for deploying machine learning efficiently.

The good news: Enterprise interest in artificial intelligence, fueled by machine learning, continues to expand. In its most recent survey on AI Adoption in the Enterprise, O’Reilly found that 85% of organizations are at least exploring the use of artificial intelligence.

The bad news: Too many artificial intelligence projects fail. Currently, an estimated 78%-87% of artificial intelligence projects never make it into production.

One issue is the difficulty of moving machine learning models from development to production. Building a machine learning model in Python is one thing. Scaling it to a production environment in a way that meshes with an organization’s culture is far more challenging.

In the Beginning, There Was DevOps

This is not a new story. Software developers faced similar problems 20 years ago. As Nik Bates-Haus describes in Getting DataOps Right, published by O’Reilly, early software engineering projects were often plagued by high costs, slow delivery, poor quality, low user satisfaction, and failure to adapt to changing user requirements. A key problem was that the development and production processes typically had little if any connection.

In 2001, 17 leaders in the software development community gathered in Utah to discuss solutions to the problems their projects faced. They issued the Manifesto for Agile Software Development, which states:

We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:

    • Individuals and interactions over processes and tools
    • Working software over comprehensive documentation
    • Customer collaboration over contract negotiation
    • Responding to change over following a plan

That is, while there is value in the items on the right, we value the items on the left more.

By 2010, this gave rise to DevOps—an organizational model that brings software development and IT operations together.

The game began to change. Innovation in software development accelerated and became scalable. Problems with new code could be identified and rectified more quickly. A culture of continuous delivery and continuous integration of new software became commonplace. DevOps did not solve every problem, but it did become a standard approach in many software-reliant organizations.

Taking the Next Step to ML Ops

How might this thinking apply to machine learning projects?

As Cristiano Breuel points out in his post, ML Ops: Machine Learning as an Engineering Discipline, machine learning projects have an added layer of complexity. A successful machine learning project must, from an engineering perspective, coordinate machine learning code pipelines and data pipelines.

Within a machine learning project, data must be validated and prepared, addressing issues like missing values, varying data formats, data errors or inconsistencies, and outliers. Integrating data from disparate sources is a major challenge. These issues become exponentially more complex when working with large scale or quickly changing data, compounded further by organizational changes such as mergers or acquisitions. Data consistency, security, and access are a must.

In the last decade, data engineering emerged as a field with its own DataOps protocols, again tying development to production. In his chapter in Getting DataOps Right, Andy Palmer offers some guiding principles for DataOps:

  • High levels of automation to foster repeatability and to remove whenever possible the delays and errors inherent in human processing of data tasks
  • An open best-of-breed tool approach that can adapt to quickly changing data environments and user requirements
  • Careful attention to the integration of data systems
  • Tracking of data lineage and provenance
  • Layered user interfaces that match differing needs, skill levels, and access rights of individuals across an organization

ML Ops and Value Creation

Most recently, ML Ops has come onto the scene as an application of the DevOps approach to all aspects of machine learning projects. ML Ops is both a philosophy and a way of organizing human and technical resources.

Machine learning models are dynamic. The initial model development stage is often quite experimental, involving many iterations of candidate models. Doing this effectively at scale requires good version control. Data used to develop models must be validated and appropriately split into training and testing sets. Models must be validated, both in terms of their technical performance and in terms of their effectiveness in addressing the needs for which they were created. As no model is perfect, this requires a range of metrics and judgments. After deployment, models must be monitored to detect performance degradation as circumstances change—and to enable rapid model updates when needed.

ML Ops is a new field that looks to build upon DevOps and DataOps to address these challenges. Some goals of an ML Ops approach, identified by its sponsoring organization, include:

  • Unifying release cycles for machine learning and software applications
  • Automating data validation testing, model testing, and model integration testing
  • Enabling the application of Agile methodologies to machine learning projects
  • Fully embedding machine learning projects in larger continuous delivery—continuous integration production pipelines
  • Functioning with an agnostic approach to language, framework, infrastructure, and practice

My colleague John Aaron offers a simple way to view this: Profitability = Information Gain x Execution. Information gain is what machine learning produces—new and useful information gleaned from patterns in data. Execution is a more subtle art, requiring leadership, adaptability, and coordination of technical and human resources.

Keeping one’s focus on producing value is the most important lesson we look to impart in Elmhurst University’s Master’s in Data Science and Analytics program. ML Ops provides one approach for doing so from an engineering perspective.

Boost Your Data Science Skills

Elmhurst University’s Data Science and Analytics program helps professionals excel in business. Meanwhile, our flexible online format allows you to earn a master’s degree on your terms. Ready to learn more? Complete the form below.

Fill out my online form.
About the Author

Jim KulichJim Kulich is a professor in the Department of Computer Science and Information Systems at Elmhurst University. Jim directs Elmhurst’s master’s program in data science and analytics and teaches courses to graduate students who come to the program from a wide range of professional backgrounds.

Illustration by Tanner Wayment
Posted April 13, 2021

Leave a Reply

Your email address will not be published. Required fields are marked *

Connect with #elmhurstu