MLflow - Configuration and Quick Start

Introduction to MLflow: Streamlining Machine Learning Operations

Machine learning (ML) projects can quickly get complex — there's a lot to manage: from tracking experiments to organizing code and deploying models. That's where MLflow comes in. MLflow is an open-source platform designed to make managing the entire machine learning workflow easier and more organized. It helps you track experiments, package your code, and deploy models with a lot less hassle.

Whether you're training models, experimenting with different algorithms, or deploying them to production, MLflow makes the process smoother and more manageable.

Applications of MLflow

MLflow provides a set of tools that cover all stages of the machine learning workflow, from the initial experiment to deployment. It works with many popular ML frameworks and is flexible enough to be used in different environments, whether you're working on a small individual project or a large one within a developer team.

Key Features of MLflow

MLflow Tracking:
- Tracking Experiments: MLflow helps you log and track important details from your experiments, such as the parameters you used, the results (metrics), and the files (artifacts) generated. This way, you can keep a clear record of what works best and easily compare different experiments.
- Versioning: MLflow automatically keeps track of each experiment run. This means you can go back and see how your experiments evolved over time and replicate past results when needed.
MLflow Projects:
- Reproducibility: MLflow Projects help you package your code and all its dependencies in a standard format. This ensures that your code can be run anywhere—on someone else's machine or in a different environment—without worrying about missing dependencies or setup issues.
- Consistency Across Environments: With MLflow Projects, you can run your code on any system, whether it is your local machine, a cloud platform, or a server, and it will work the same way each time.
MLflow Models:
- Managing Models: MLflow lets you store, track, and manage machine learning models, regardless of the framework you use. Whether you're using TensorFlow, Scikit-learn, PyTorch, or any other library, MLflow helps keep your models organized and ready for deployment.
- Cross-Platform Support: Once a model is saved, MLflow makes it easy to deploy it on different platforms, whether it is on a local server or a cloud service like AWS or Azure.
MLflow Model Registry:
- Centralized Model Management: MLflow’s Model Registry acts as a central place where all your models are stored and tracked. You can version models, check metadata, and control access to them as they move through different stages (like development, staging, and production).
- Model Lifecycle: With the registry, you can track and manage how models evolve over time, making sure only approved models are used in production.
MLflow Deployment:
- Easy Deployment: MLflow simplifies deploying machine learning models to different platforms. Whether you're deploying to cloud services, on-premise servers, or creating an API for a model, MLflow makes it easier to serve models in production environments.

Setting up your MLflow Tracking Server

To set up the MLflow tracking server, there are a few key components we will need to configure:

A Virtual Machine (VM): The virtual machine will act as the host environment where MLflow will be installed and run. On this VM, we will be able to access the MLflow user interface (UI), track experiments, and deploy machine learning models. This machine will serve as the primary workspace for all MLflow operations, including running experiments, storing logs, and managing models.
A Database for Tracking Metadata: MLflow requires a database to store metadata related to experiments, such as parameters, metrics, and the details of each model run. This is where MLflow will track the history of experiments to allow for comparison and versioning. MLflow supports specific databases, including MySQL, PostgreSQL, and SQLite. You’ll need to set up one of these databases to ensure that all your experiment data is properly recorded and accessible.
Artifact Storage for Models and Files: MLflow also requires a place to store artifacts, which typically include trained machine learning models, model versions, logs, and other relevant files produced during the experimentation process. While it is possible to store artifacts on the virtual machine itself, a more common and scalable approach is to use cloud storage solutions like Amazon S3.

Before installing MLflow, we first need to ensure that the virtual machine is up and running. Once the virtual machine is set up, we can install MLflow, configure it to interact with the chosen database for metadata tracking, and configure artifact storage. With everything in place, we can begin using MLflow to track experiments, serve models, and manage our machine learning lifecycle efficiently.

This version provides a more detailed and structured explanation of the components involved in setting up MLflow, focusing on the importance of each and how they fit into the broader setup.

Steps to Set Up MLflow on OpenStack through a VPN

Set Up a VPN Connection
- Establish VPN Connectivity: First, ensure you have VPN access to your OpenStack environment. This could involve connecting to the VPN using a client (like OpenVPN), using credentials and configuration files as described here.
Create a Virtual Machine (VM) in OpenStack
- Once connected to the VPN, log in to the OpenStack Dashboard dashboard.cloud.lxp.lu using your user id and password. Information is also provided here.
Launch a Virtual Machine
- Go to the "Instances" section and click on “Launch Instance.” Select the desired operating system (usually Ubuntu or CentOS for MLflow installations).
- Choose the appropriate flavor based on your VM's required resources (CPU, memory, storage).
- [Optional]: Allocate a floating IP to allow external access to the instance if necessary.

Installation and Configuration of the MLflow tracking server on the compute engine instance (VM)

Connect to your VM
- Once the VM is launched, you can SSH into it using the private or floating IP address provided. Example SSH command (username=ubuntu) is the default account name for the ubuntu 24.0 instance and remote_host=10.40.0.178 is the instance IP Address):
```
ssh -i "path/to/private-key" username@remote_host
```
Update your VM
- Start by updating the package list and upgrading the system to make sure everything is up to date:
```
    sudo apt update && sudo apt upgrade -y
```
- Install Dependencies: Install necessary packages like pip and python3-pip to run MLflow:
```
    sudo apt install python3-pip python3-dev python3-venv libmysqlclient-dev -y
```
Create a Virtual Environment and Install the Required Packages
- Use the python3 -m venv command followed by the name of your mlflow virtual environment:
```
    python3 -m venv mlflow
```
- Activate your virtual environment:
```
    source mlflow/bin/activate
```
- Install MLflow using pip:
```
    pip3 install mlflow
```
- After the installation is complete, verify that MLflow is installed successfully:
```
    mlflow --version
```
- Use pip to install psycopg2, a PostgreSQL database adapter for Python:
```
    pip3 install psycopg2-binary
```
Set Up the Database for Tracking
- Choose a Database: MLflow supports MySQL, PostgreSQL, and SQLite. For a production-ready setup, MySQL or PostgreSQL is recommended. In this example we will use PostgreSQL. You an install it using the following commands:
```
    sudo apt install postgresql postgresql-contrib -y
    sudo service postgresql start
```

Create a Database for MLflow

Log into PostgreSQL and create a database for MLflow:

    sudo -u postgres psql
    CREATE DATABASE mlflowdb;
    CREATE USER mlflowuser WITH PASSWORD 'password';
    ALTER ROLE mlflowuser SET client_encoding TO 'utf8';
    ALTER ROLE mlflowuser SET default_transaction_isolation TO 'read committed';
    ALTER ROLE mlflowuser SET timezone TO 'UTC';
    GRANT ALL PRIVILEGES ON DATABASE mlflowdb TO mlflowuser;
    GRANT USAGE ON SCHEMA public TO mlflowuser;
    GRANT CREATE ON SCHEMA public TO mlflowuser;
    ALTER ROLE mlflowuser CREATEDB;
    \q

Configure and Run the MLflow Tracking Server
- By default, the mlflow tracking server uses the filesystem to log metadata of the ML experiments and runs. Logs by default are stored under the ./mlruns directory. To configure the mlflow tracking server using the default configuration parameters use:
```
    mlflow server \
        --host 0.0.0.0 \
        --port 8080
```
- A different configuration can be obtained by adding a backend database. In this example, we will configure the mlflow tracking server pointing to a local PostgreSQL database. When starting MLflow, specify the connection URL for the PostgreSQL database:
```
    mlflow server \
        --backend-store-uri postgresql://mlflowuser:password@localhost/mlflowdb \
        --host 0.0.0.0 \
        --port 8080
```
- Once you start the above service you will see an output similar to:
```
[2025-01-17 12:29:13 +0000] [2044] [INFO] Starting gunicorn 23.0.0
[2025-01-17 12:29:13 +0000] [2044] [INFO] Listening at: http://0.0.0.0:8080 (2044)
[2025-01-17 12:29:13 +0000] [2044] [INFO] Using worker: sync
[2025-01-17 12:29:13 +0000] [2045] [INFO] Booting worker with pid: 2045
[2025-01-17 12:29:13 +0000] [2046] [INFO] Booting worker with pid: 2046
[2025-01-17 12:29:13 +0000] [2047] [INFO] Booting worker with pid: 2047
[2025-01-17 12:29:13 +0000] [2048] [INFO] Booting worker with pid: 2048
```
- Access the MLflow UI. Open a browser and navigate to the IP address of your VM with port 8080 (e.g., http://10.40.0.178:8080). You should now see the MLflow UI, where you can track experiments, view metrics, and manage models. Once your MLflow server is up and running, you can begin logging experiments, tracking metrics, and saving models.

Train and Log your First Model with MLflow

In this section, we will guide you through the process of using MLflow to log, track, and compare machine learning models built using Scikit-learn. We will use simulated data to train multiple models, log hyperparameters, metrics, and artifacts (like visualizations), and use MLflow to manage and compare the experiments. To follow this tutorial, you’ll need to write and edit Python code in a script. We will launch the MLflow session to the Meluxina supercomputer and connect it to the cloud MLflow tracking server (running on the Openstack compute engine instance) using the appropriate APIs.

Connect to the Meluxina Supercomputer and Load the Required modules
- At first, connect to the machine, get an interactive job on the cpu partition, and load the required MLflow module:
  Get an interactive job for your ML experiments
```
# Request an interactive job
salloc -A [p200xxx-replace-with-your-project-number] -t 01:00:00 -q dev -p cpu --res=cpudev  -N1
module load env/release/2024.1
module load MLflow
```
  1. Create and Set Up a Python Virtual Environment
    - To start, we will need to create a virtual environment for our ML project. This helps keep dependencies isolated and ensures that the project environment is reproducible. Install the required libraries in your virtual environment.
```
python3 -m venv mlflow-train
source mlflow-train/bin/activate
pip3 install scikit-learn pandas matplotlib
```

Import Libraries

Once the environment is set up and all dependencies installed, you can write and edit your Python script. The code bellow provides the necessary imports to start building your ML models and login them with MLflow:

import pathlib
import mlflow
import mlflow.sklearn
from mlflow.models import infer_signature
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import make_regression

Data Simulation

Generate a synthetic dataset for a random regression problem using the make_regression method from scikit-learn:

def simulate():
    # Create simulated regression data                                                                                                                      
    X, y = make_regression(n_samples=1000, n_features=5, noise=0.1, random_state=42)

    # Split the data into training and testing sets                                                                                                         
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    return X_train, X_test, y_train, y_test

Define the Remote MLflow Tracking Server
- You can specify the remote server URI in your Python script using the following command::
```
mlflow.set_tracking_uri("http://your-remote-server=address:8080")
print("Tracking URI set to:", mlflow.get_tracking_uri())
```
Define your First Experiment
- An experiment in MLflow is essentially a container for organizing and tracking your machine learning runs. A run represents a single execution of a piece of code that performs tasks like training a model, evaluating its performance, and logging metrics and parameters. Multiple runs can belong to the same experiment, allowing you to compare their results.To initiate or set an experiment use:
```
mlflow.create_experiment('my_first_experiment') 
#Once the experiment is initiated, use the set API
mlflow.set_experiment('my_first_experiment')
```

Define and Train your Models in an MLflow Session

In this example, we will train several models from scikit-learn (Decision Tree, Random Forest, Support Vector Machine, KNN, and Ridge Regression) and log them to MLflow. Define a dictionary of the models to train using:

models = {
    "Decision Tree": DecisionTreeRegressor(max_depth=5),
    "Random Forest": RandomForestRegressor(n_estimators=100, max_depth=5),
    "Support Vector Machine (SVM)": SVR(C=1.0, kernel='linear', epsilon=0.1),
    "K-Nearest Neighbors": KNeighborsRegressor(n_neighbors=5),
    "Ridge Regression": Ridge(alpha=1.0)
    }

Start your training and use 'mlflow.start_run()` to assign and initiate a run for each of the model trainings:

for model_name, model in models.items():
    with mlflow.start_run():
        print(model_name)

        fig, ax = plt.subplots()
        ax.scatter(X_train[0,:], X_train[1,:])
        ax.set_title("Feature Scatter Plot", fontsize=14)
        plt.tight_layout()
        save_path = pathlib.Path("./tmp/scatter_plot.png")
        fig.savefig(save_path)

        # Train them model                            
        model.fit(X_train, y_train)
        signature = infer_signature(X_train, model.predict(X_train))

        # Make predictions                                                                                                                      
        y_pred = model.predict(X_test)

        # Calculate Mean Squared Error (MSE) and R²
        mse = mean_squared_error(y_test, y_pred)
        r2 = r2_score(y_test, y_pred)

        # Log the model to MLflow                                                     
        mlflow.sklearn.log_model(model, model_name, signature=signature)

        # Log hyperparameters and metrics                                          
        mlflow.log_param("model_type", model_name)
        mlflow.log_param("parameters", model.get_params())
        mlflow.log_metric("mse", mse)
        mlflow.log_metric("r2", r2)
        mlflow.log_artifact("./tmp/scatter_plot.png")

Summary of Key MLflow methods used:

mlflow.start_run(): Starts a new experiment or run. mlflow.infer_signature(): The infer_signature function can be used to automatically infer the schema from the training data and the model’s predictions. By logging the signature along with the model, you create a self-contained model package that includes the expected input-output schema, which simplifies model versioning, validation, and deployment. mlflow.log_model(): Logs the trained model. mlflow.log_param(): Logs hyperparameters, such as the type of model. mlflow.log_metric(): Logs metrics, such as the MSE, accuracy and R² score. mlflow.sklearn.log_model(): Logs the model, including the signature, so the input-output data schema is tracked. mlflow.log_artifact(): Optionally logs artifacts (e.g., CSV files, figures, dataframes, images).

Compare your Trained Models

In a different setup, we can compare the metrics from the four different models and log the result in single run. In the example bellow, we use a histogram (figure) to visualize the metrics from the four models and the log_artifacts() methods to log the figure:

model_comparison = {
    "Decision Tree": {"mse": 0, "r2": 0},
    "Random Forest": {"mse": 0, "r2": 0},
    "Support Vector Machine (SVM)": {"mse": 0, "r2": 0},
    "K-Nearest Neighbors": {"mse": 0, "r2": 0},
    "Ridge Regression": {"mse": 0, "r2": 0}
}

for model_name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    model_comparison[model_name]["mse"] = mse
    model_comparison[model_name]["r2"] = r2

with mlflow.start_run():
    # Convert to DataFrame for visualization                                                                                                                               
    comparison_df = pd.DataFrame(model_comparison).T
    comparison_df.plot(kind="bar", figsize=(10, 6), title="Model Comparison: MSE and R²")
    plt.ylabel("Value")
    plt.savefig("./tmp/model_comparison.png")
    mlflow.log_artifact("./tmp/model_comparison.png")

View your Experiments and Runs in the MLflow UI

Once you have logged the models, parameters, and metrics, you can visualize and compare different the runs in the MLflow UI. Access the MLflow UI at the IP address of your VM with port 8080 (e.g., http://10.40.0.178:8080). The central dashboard providing information about your experiments and runs is shown as follows:
You can choose a specific run from the list of runs. A new dashboard providing the associated parameters linked to run id is shown. Metrics evaluating the model associated with the specific run are also provided:
You can also navigate to the run comparing your models to see the resulted image output logged as an artifact:
A model url can be found in the artifacts section corresponding to the associated run. You can use this url and the script provided in the next section to load and serve your trained model. Input and output types and sizes are also provided in this dashboard:

Load and Serve a Logged Model

After logging a model in MLflow (as demonstrated in the previous section), you can load it for inference. Depending on the framework used, MLflow provides different methods to load models. In this section, we'll focus on loading scikit-learn models, though the approach is similar for models from other frameworks such as TensorFlow, PyTorch, etc.

To load a model, you can use the mlflow.sklearn.load_model() function for scikit-learn models, or the more generic mlflow.pyfunc.load_model() for models of other types.

Here is an example of loading and using a scikit-learn model:

import mlflow
import mlflow.sklearn

mlflow.set_tracking_uri("http://10.40.178:8080")

# Load the Ridge Regression model using the model's run ID                                                                                                                                                                                                                                                                                                                              
run_id = "db1df5133102413987754517776ccdcb"  # Replace with the actual run ID                                                                                                                                                                                                                                                                                                           
model_uri = f"runs:/{run_id}/Ridge Regression"
print(model_uri)
loaded_model = mlflow.sklearn.load_model(model_uri)

# Example data                                                                                                                                                                                                                                                                                                                                                                          
new_data = {"data": [[1.5, -0.2, 0.3, 1.2, -0.5]]}

# Predict with the loaded model                                                                                                                                                                                                                                                                                                                                                         
predictions = model.predict([[1.5, -0.2, 0.3, 1.2, -0.5]])
print(predictions)

Deploying and Serving the Model

Once your model is logged in MLflow, you can deploy it to a local server. For this example we will use the Openstack compute engine. Connect to your VM and use the following command to deploy your model:

mlflow models serve -m runs:/<run_id>/model -p 8081
#e.g. mlflow models serve -m mlflow-artifacts:/1/db1df5133102413987754517776ccdcb/artifacts/Ridge\ Regression --no-conda --host 0.0.0.0 -p 8081

You can find the path of your model in the MLflow UI at the runs section. It is important to expose a service from your VM through the VPN for serving your model following the instruction provided hear.

Once the model is served, you can interact with it by sending HTTP requests. Here’s an example of how to send a prediction request to the model’s REST API:

import requests
import json

# Prepare the input data (example: data for the model)                                                                   
new_data = {
    "dataframe_split": {
        "columns": ["feature_1", "feature_2", "feature_3", "feature_4", "feature_5"],
        "data": [[1.5, -0.2, 0.3, 1.2, -0.5]]
        }
}

print(json.dumps(new_data))
# Send a POST request to the model's REST API endpoint                                                                   
response = requests.post(
    url="http://10.40.0.178:8081/invocations",
    headers={"Content-Type": "application/json"},
    data= json.dumps(new_data)
)

# Check the response                                                                                                     
print("Predictions:", response.json())

Conclusion

In this tutorial, we have explored how to use MLflow for managing machine learning experiments, models, and deployments with a remote tracking server. Key takeaways include:

Experiment Tracking: MLflow allows you to log parameters, metrics, and models during training, making it easy to monitor and compare experiments.
Remote Tracking Server: Setting up a remote MLflow server centralizes experiment data, enabling collaboration across teams and ensuring scalability for large projects.
Model Management: With the MLflow Model Registry, you can manage multiple model versions, track their lifecycle, and transition models between staging and production environments.
Reproducibility & Collaboration: MLflow’s ability to store and retrieve experiments ensures reproducibility and easy collaboration, even across distributed teams.

By setting up a remote server, you gain the ability to manage experiments at scale, automate workflows, and maintain a streamlined model deployment pipeline. MLflow helps improve the efficiency and reproducibility of machine learning projects, whether you're working individually or within a team.