MLOps: From Model-centric to Data-centric AI Review

MLOps: From Model-centric to Data-centric AI Review

·

3 min read

This is a review of Andrew Ng's Talk "MLOps: From Model-centric to Data-centric AI."

AI System = Code + Data

Data is Food for AI

DALL·E 2022-08-15 22.38.48 - Data is Food for AI, modified 복사.png

If we compare AI modeling work to cooking, the process of preparing data is the same as preparing good cooking ingredients. In order to make good food, the ingredients must also be good.

Much of what data scientists and engineers do is preprocess data. Cleaning data and preprocessing it in the form necessary for training takes a lot of time. To make this efficient, it is good to analyze the life cycle of the ML project.

Lifecycle of an ML Project

image.png

Collect Data: Define and collect data

  • Is the data labeled consistently?

In this iguana detection problem, labeling may appear in different forms depending on the labeler. In this way, data labeling needs to be systematically created for label consistency issues.

  • Small Data and Label Consistency

The smaller the data, the greater the influence of the noisy data problem. So for less than 10,000 datasets, if you solve the problem of label noise, there is very much room for model performance.

Train model

We need to be worried about how to tune the model well in the model-centric view. But in the data-centric view, we need to be worried about how to modify the data and improve the performance.

To manage data effectively, we need to systematically change it; training the model, analyze the outcomes, and find a variety of solutions.

Deploy: Deploy and maintain system

Continuous monitoring of performance degradation and finding out which data is effective. The data set is updated to periodically update the model as well.

Making it systematic: The rise of MLOps

Traditional software engineering cares about code. Managing this code systematically is called DevOps.

However, ML engineering must also manage data together. This is called MLOps. MLOps should support all phases of Life Cycle mentioned earlier.

oew4xmhzorb91.png

The most important role of MLOps is to ensure a high quality dataset throughout the ML system life cycle.

Good data:

  • Defined consistently
  • Cover of important cases
  • Has timely feedback from production data
  • Sized appropriately

Takeaways

A Chat with Andrew on MLOps_ From Model-centric to Data-centric AI 47-44 screenshot.png

In Data-centric AI, we have to think about how to systematically change the data to improve performance.

Personal Thought

DALL·E 2022-08-15 20.34.54 - Film still of an young man preprocessing dataset, medium shot, midshot, revised.png

In AI research, large datasets and large models have been studied in a dominant way, but even large models learned from web data cannot solve all problems. Because practical data could not be easily obtained, it would not have been used for training.

Therefore, in order for ML to be used practically, it is necessary to study how to perform well through small data. To do this, it seems necessary to study data-centric AI and to develop technically.

Reference

MLOps: From Model-centric to Data-centric AI