« Back to Glossary Index

MLOps (Machine Learning Operations) is a set of practices that combines machine learning, DevOps and data engineering to streamline the process of taking machine learning models from development to production.

𝗠𝗟𝗢𝗽𝘀 𝗲𝗻𝘀𝘂𝗿𝗲𝘀 𝗿𝗲𝗹𝗶𝗮𝗯𝗹𝗲 𝗮𝗻𝗱 𝗳𝗮𝘀𝘁𝗲𝗿 𝗱𝗲𝗹𝗶𝘃𝗲𝗿𝘆 𝗼𝗳 𝗵𝗶𝗴𝗵-𝗾𝘂𝗮𝗹𝗶𝘁𝘆 𝗠𝗟 𝗺𝗼𝗱𝗲𝗹𝘀 𝘁𝗼 𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻.




To understand MLOps, it is important to first understand the ML systems lifecycle which involves several key concepts:

1/ 𝗗𝗮𝘁𝗮 𝗦𝗼𝘂𝗿𝗰𝗲𝘀
Data sources refer to the origins from which data is collected for machine learning. These can include databases, APIs, data lakes, and external datasets (structured vs. unstructured). The choice of data sources is crucial as they directly impact the quality and relevance of the data used for training models.

2/ 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁
Model deployment is the process of integrating a trained machine learning model into a production environment where it can make predictions based on new data. This involves setting up the necessary infrastructure, such as APIs or web services, to serve the model to end-users or applications.

3/ 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴
Feature engineering involves the creation and selection of relevant features (input variables) from raw data that enhance the performance of machine learning models. This process includes transforming data, handling missing values, and encoding categorical variables to ensure that the model can learn effectively from the data.

4/ 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁
Model development encompasses the entire process of designing (coding), training, and validating/evaluating machine learning models. This includes selecting algorithms, tuning hyperparameters, and evaluating model performance using metrics to ensure that the model meets the desired accuracy and reliability.

5/ 𝗗𝗮𝘁𝗮 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 (𝗦𝗲𝗿𝘃𝗲 𝗮𝗻𝗱 𝗖𝗼𝗻𝘀𝘂𝗺𝗲)
A data pipeline is a series of data processing steps that move data from its source to the machine learning model for training and inference. The “serve” aspect refers to how data is made available to the model for predictions, while “consume” refers to how the model receives and processes this data to generate outputs. Efficient data pipelines ensure that data flows seamlessly and is transformed appropriately for use in machine learning applications.

« Back to Glossary Index