INDUSTRIALISING MACHINE LEARNING II

We’ve touched upon the importance of data quality, and what we might term sort of a data-centric strategy around machine learning development. Now, what’s your understanding of a machine learning pipeline?

It is important to understand that there is a machine learning model and then there is a machine learning pipeline. The model is just a set of coefficients that defines the mapping between the input between the features of a new instance and the target variables.

One thing that people need to understand is the concept of model performance model drift and data drift. How does this fit into MLOps and in terms of management of models and drift?

Two distinct conversations need to be explored. One is about MLOps

The issue of data drift is so often misunderstood, particularly by business leaders rather than technical people. What is the importance of drift monitoring? And why it’s important in terms of managing your model performance? When when you have models in production?

In the instance of data drift, you have a distribution, all the machine learning and deep learning methodology, as a subclass of machine learning, and the model works only if you get data from the same distribution. If the distribution changes, but you have the model trained on the previous replicas of data or the previous version of the distribution, then it will start making mistakes.

How should an organization that is thinking about building machine learning applications, reflect on designing a pipeline for a specific industry use case and what are the kind of considerations in designing that pipeline?

Firstly, we need to understand that the impact of AI is not immune to the Pareto principle. So from 80% to 90% of the value will come from 20% of the potential use cases. The first thing that any industry should do is to find those high-value low hanging use cases that are good candidates for machine learning applications. A big advantage is that even if we are talking about industry-specific problems, the starting phase, is quite generalized.

What are your last thoughts when considering the issue around machine learning pipelines and data-centric machine learning and what are your recommendations?

Firstly, it’s important to think about that problem at the platform level and start thinking about the building blocks that you could start reusing within your organization to enable more AI adoption over time. Only in 2015, at Google, there were only a few large-scale machine learning applications; now it’s like more than 1000 different applications.

Read More About Machine Learning Pipelines

Click Here

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
The Data Analysis Bureau

The Data Analysis Bureau

We are a Data Science and Data Engineering Innovation Agency specialising in Machine Learning.