How to: Automating Automated Machine Learning in Azure Part 1 – Scheduling AutoML runs

A key concern with machine learning projects is being able to adapt to changes in the real world. This is especially true when the problem you are trying to solve evolves constantly, as any model training is always done using historical data and as such after a while it may no longer represent the problem accurately. This is something that our friends at Dark Roast Ltd. discovered after having run their coffee pot prediction app for a few months: In the beginning the predictions were quite accurate, but as time passed, they found themselves having to brew more and more extra coffee since the predictions seemed to constantly be too small. Maybe this was because the new coffee beans they bought were so delicious that people simply wanted to drink more coffee, or maybe their previously tea-loving British workers moved over to coffee, they weren’t entirely sure. But what they did know was that the machine learning model they were using was getting outdated.

The solution to this problem is seemingly simple: To re-train the model whenever there is enough new training data collected to warrant that the re-trained model is going to give better predictions. For Dark Roast Ltd. that simple solution came with a small problem though: Re-training the model is always going to require manual work when it comes to preparing the training data, performing the actual model training and finally deploying the new model to production. While it may not be the end of the world to do those steps manually once a week or once a month, it’s still boring repetitive work that no one really wants to do. So, what’s one to do then?

Automate it all, of course.

Automate, automate it all!

This is the first part in a series of posts on automating Automated Machine Learning in Azure. In this part I will concentrate on performing Automated Machine Learning runs on a schedule, and in future posts I’ll go through automated model deployment and updating datasets for the training runs.

The Auto ML conundrum

Automated Machine Learning is an interesting feature of Azure Machine Learning for a couple of reasons. First, it performs model training and hyperparameter tuning for you, guaranteeing that the resulting machine learning models are often quite good without requiring any extensive data science work from the person performing the AutoML training. Second reason why Auto ML is particularly interesting for software developers is because it performs something that’s called featurization automatically. Auto ML’s featurization performs various modifications to the values in your training dataset to make the data more suitable for machine learning algorithms to use: It can hash string values, scale numerical values, replace missing values and more. These are things that you are expected to do yourself if you are performing normal model training. And what’s more, once you’ve trained a machine learning model the usual way, the values you pass to the model to get predictions are expected to have gone through the same featurization process as well. So not only do you need to implement featurization while training the model, but also in the application which is using the trained model.

But that’s not the case with Auto ML, because the final model does featurization for you, and all your application needs to do is pass the original, un-featurized data into the model.

So, Auto ML is full of useful quality of life improvements that make using it much more attractive than what normal machine learning would be. However, there is one issue to tackle with Auto ML: Currently, automated machine learning runs can be done in only one of two ways in Azure. Either manually through the Azure Machine Learning studio, or by using the Azure Machine Learning SDK for Python. Since our goal this time is to automate Auto ML, we have no choice but to skip the manual UI route and dive straight into Python. Which raises a question: Where are we going to run our Python scripts?

Azure provides a number of options for running Python code, from virtual machines to App Services using Web Apps or Functions. Personally, my goal is to do things as easily and cleanly as possible, so I decided to skip on both virtual machines and web apps straight away. I gave Azure Functions a good try but found myself banging my head against a brick wall of “Out of memory” errors despite the app having plenty of free memory to spare. So that seemed like a bug somewhere in the myriad of Python libraries involved, which hopefully will eventually be fixed. Finally, I discovered a solution that was even simpler than I dared to imagine: Running the Auto ML training scripts via Azure Machine Learning graphical designer pipelines.

The entire solution is not quite that simple, though, because of one single thing: Python package dependencies. You can run Python scripts straight inside Azure Machine Learning’s graphical designer pipelines, but the graphical designer is using very specific versions of various Python packages when it sets up its Python execution environment. These package versions are likely to clash with whatever versions your script is going to use, especially if you plan on using newest possible versions. So as a result you can’t actually run the Auto ML training script itself from the “Execute Python Script” module – but you can create a separate Azure Machine Learning experiment to run the training script with its own execution environment and package dependencies. Let’s see what that looks like.

Running the Auto ML training script inside Azure Machine Learning…without human interaction

Azure Machine Learning provides you with compute instances, which are virtual machines where you can write Python scripts and execute them manually. If you’ve run Python in Azure ML, it’s most likely been inside one of these. Compute instances don’t work for us since we are interested in automation, though, so we’ll have to look into compute clusters which are used for running machine learning tasks in the background. So step-by-step our solution is going to look like this:

  1. An Azure Machine Learning graphical designer pipeline is run on a compute cluster
  2. The pipeline runs a custom Python script, which creates a new execution environment for our actual Auto ML training script
  3. The Auto ML training script is run on a compute cluster in its own environment
  4. The actual Auto ML training, started by the training script, is also run on a compute cluster in its own environment

If that sounds confusing, don’t worry – it’s actually quite simple once we get into it. In order to do all of this you will need the following:

  • A compute cluster with at least two nodes. Or two different compute clusters. The Auto ML training script which I’ll show later waits for the Auto ML training to finish, which reserves an extra node from a compute cluster. Alternatively, you can modify the training script to not wait for the training to finish, but waiting is useful if you want to do other actions once the training is completed (such as deploying the model automatically)!
  • A user assigned managed identity in Azure, which is given contributor permissions to your Azure Machine Learning workspace and which is assigned to the compute cluster you are using to run the Auto ML training script.
  • The Auto ML training script itself, in a zip archive, uploaded into Azure Machine Learning as a file dataset.
  • Training data for the Auto ML run as a csv file, in Azure Blob Storage which is registered into Azure Machine Learning as a tabular dataset. I’ll assume that you already have some training data in Azure Machine Learning set up, so I’ll skip this part in this blog.

Creating a user assigned managed identity

The purpose of having a user assigned managed identity for our compute cluster is to allow the Auto ML training script to authenticate against Azure Machine Learning, so that it can actually start the model training operation. Even though you are running Python scripts in compute clusters inside Azure Machine Learning, those scripts aren’t automatically authenticated to use Azure Machine Learning’s APIs. And going with managed identities is both easier and more secure than using a service principal with a client ID and a client secret. Here’s how to create a user assigned managed identity and give it contributor permissions to Azure Machine Learning:

  1. In Azure Portal, create a new “User Assigned Managed Identity” resource. You’ll have to provide it with a resource group, a region and a name. I suggest selecting the same resource group and region as with your Machine Learning workspace, and then select a good descriptive name.
  1. Once your User Assigned Managed Identity -resource has been provisioned, open it in Azure Portal and copy its client ID somewhere; you’ll need it later.
  1. Go to your Machine Learning workspace in Azure Portal and click “Access control (IAM).” In there, click “Add role assignments.”
  1. On the “Add role assignment” -panel select Contributor role, assign access to user assigned managed identity and then select your user assigned managed identity from the list below. Click “Save” to confirm the role assignment.

Configuring a compute cluster for automated Auto ML training

Chances are if you are reading this post that you already have a compute cluster created in your Azure Machine Learning workspace, but we still have to make sure that it’s configured properly for what we want to do here. Namely, the cluster needs to have a maximum node count of at least 2, and we need to assign the new user assigned managed identity to the cluster as well.

  1. In Azure Machine Learning, navigate to the “Compute” -page and open the “Compute clusters” -tab. Click the name of the compute cluster you wish to use. Click “Edit.”
  1. On the “Update Compute” -panel verify that “Maximum number of nodes” is at least 2. Then toggle “Assign a managed identity”, click “User-assigned” and select the user assigned managed identity you previously created from the search box. Click “Update” to save the changes.

Preparing the Auto ML training script

Getting started with the Auto ML training script is easy – because I’ve got a template script ready for you. 🙂 Note that you might want to edit the script for your own purposes if you are performing different kinds of model training (here we are doing regression training for 15 minutes, using normalized root mean squared error as the primary metric). I recommend keeping the training timeout short while testing; you can later ramp it up to an hour or more once you are sure everything works as intended. You will also need to modify the script a little bit to make it work with your Azure Machine Learning workspace. Here’s the script:

import os
os.system(f"pip install azureml-train-automl-client==1.22.0")
import pandas as pd
import azureml.core
from azureml.core import Experiment, Workspace, Run
from azureml.core.dataset import Dataset
from azureml.core.compute import ComputeTarget
from azureml.core.authentication import MsiAuthentication
from azureml.train.automl import AutoMLConfig

tenant_id = '…'
resource_group = '…'
workspace_name = '…'
msi_identity_config = {"client_id": "…"}

dataset_name = '…'
dataset_label_column = '…'
experiment_name = '…'
compute_name = '…'

msi_auth = MsiAuthentication(identity_config=msi_identity_config)
ws = Workspace(subscription_id=tenant_id,
               resource_group=resource_group,
               workspace_name=workspace_name,
               auth=msi_auth)
			   
dataset = Dataset.get_by_name(workspace=ws, name=dataset_name)
compute_target = ws.compute_targets[compute_name]

automl_config = AutoMLConfig(task='regression',
                             experiment_timeout_minutes=15,
                             primary_metric='normalized_root_mean_squared_error',
                             training_data=dataset,
                             compute_target=compute_target,
                             label_column_name=dataset_label_column)

experiment = Experiment(ws, experiment_name)

run = experiment.submit(automl_config, show_output=True)
run.wait_for_completion()

You can also download the template training script from my Github repo!

The modifications you need to do are as follows:

  1. On lines 11, 12 and 13 provide your Azure tenant id, the name of the resource group your Azure ML workspace is in and the name of the Azure ML workspace itself.
  2. On line 14 paste the client ID of your user assigned managed identity which you copied previously.
  3. On lines 16 and 17 give the name of the data set you are using for training data, and the name of the column which contains the values you are looking forward to predicting.
  4. On lines 18 and 19 give the name of the Azure ML experiment the Auto ML training is created under, and the name of your compute cluster.
  5. You can do further modifications by changing the parameters of the AutoMLConfig-object’s constructor. Check the SDK documentation for all available options.

Once you are done, save the script as “automltrainer.py” and compress it to a zip file.

Note! Line 2 of the training script installs the azureml-train-automl-client -Python package with a specific version. Auto ML packages are not natively included in Azure ML’s Python environments, so you need to install them manually. I recommend using specific versions instead of installing the latest available version to guarantee that future updates to package versions are less likely to cause version mismatch issues. Regardless, if at any point your Auto ML training script suddenly breaks, the first thing to check for should be possible package version conflicts!

Now that you’ve got the training script zipped, the next step is to upload it to your Machine Learning workspace as a file dataset:

  1. Navigate to the “Datasets” -page and click “Create dataset” -> “From local files”
  1. Give your dataset a name and choose “File” as its type. Click “Next.”
  1. Click “Browse” and select the zip file containing your training script. Click “Next” and then “Create.”

With that done, we’ve got all the things we need to finally bring this all together!

Creating the Auto ML training pipeline

Looking back, the pipeline I showed was quite simple, containing only two steps: Loading the Auto ML training step from the file dataset, and then executing some Python code. And it still is just that simple. First, let’s take a look at the script inside the “Execute Python Script” step:

import pandas as pd
from azureml.core import Workspace, Experiment, Environment, ScriptRunConfig, Run

def azureml_main(dataframe1 = None, dataframe2 = None):
    experiment_name = '…'

    run = Run.get_context(allow_offline=True)
    ws = run.experiment.workspace

    experiment = Experiment(ws, experiment_name)
    config = ScriptRunConfig(source_directory='./Script Bundle', script='automltrainer.py', compute_target='…')

    new_run = experiment.submit(config)

    return dataframe1,

Very simple stuff. The script gets a reference to the workspace (line 8) from the current run context, and then starts a new experiment in the workspace (line 13). The experiment is configured to execute the script “automltrainer.py” on a specific compute target – your compute cluster – and the script file is located in the folder ./Script Bundle/ which is where the files in your zip file will end up in (line 11). Finally, the script returns the variable dataframe1, which is there just because Azure Machine Learning needs it to be there. Feel free to modify the script to use a descriptive experiment name and to refer to your compute cluster.

Now, on to creating the pipeline:

  1. In Azure Machine Learning navigate to the “Designer” -page and click to create a new blank pipeline
  1. From the “Datasets” group drag the file dataset containing your training script into the designer canvas. On the properties panel choose “Always use latest” as the version number – just to be safe.
  1. From the “Python Language” group drag an “Execute Python Script” step to the designed canvas and connect the file dataset to the rightmost connector on the script step. On the properties panel click “Edit code”, paste your version of the script from above and click “Save.” 
  1. Open the settings pane for your pipeline with the cogwheel icon. Select a default compute target for the pipeline and give your pipeline a good descriptive name. Make sure to save your pipeline and then click “Submit” to start a test run!

If everything works as intended (and it always does on the first try, right? 🙂 ), the Execute Python Script will get queued and it will finish in a few minutes’ time. To monitor the actual execution of the Auto ML training go to the “Experiments” -page in Azure Machine Learning and you will see the experiment started from the pipeline (in my case, “automated-auto-coffee-consumption-parent”) and the actual Auto ML training run (“automated-auto-coffee-consumption”). Click on these experiments to view the status of the runs and any logging information that they have produced. If your runs finish with “Completed” status you are ready for the last step of this post: Scheduling the Auto ML training runs.

Running the Auto ML training graphical pipeline on a schedule

In order to run the pipeline on a schedule you first need to publish it. Go back to the pipeline in the graphical designer and click “Publish.” Select to create a new pipeline endpoint and give your endpoint a nice descriptive name, and then click “Publish.”

Publishing a pipeline endpoint creates a REST endpoint which you can call with an HTTP POST request in order to trigger the pipeline at will. To get the endpoint URL, navigate to the “Pipelines” -page, open the “Pipeline endpoints” -tab and click on the name of the endpoint you just created. You will see the REST endpoint’s URL in the “Pipeline endpoint overview.”

Ultimately it will be up to you to choose what’s your favourite way of triggering this endpoint, but since I do like keeping things as simple as possible, I went with creating a Logic App for this purpose. My Logic App is very minimalistic, containing a recurrence trigger that triggers every seven days and an HTTP action to trigger the endpoint. HTTP requests performed against the endpoint must be authenticated though, so I went ahead and used the same user assigned managed identity which I also assigned to my compute cluster. Here’s how to get the Logic App running:

  1. Create a new Logic App resource in Azure. Once it’s created navigate to it in Azure Portal.
  2. Click “Identity” under “Settings” and open the “User assigned” tab. Add the same user assigned managed identity which you previously created for your compute cluster.
  3. Open the Logic App in designer view and add a recurrence trigger and an HTTP action. 
  1. Configure the HTTP action as follows:
    Method: POST
    URI: The REST endpoint URL for your published pipeline
    Body: { }
    Authentication type: Managed Identity
    Managed identity: The user assigned managed identity you added in step 2 
  1. Save your Logic App

And that’s it! For real! In the next part of this series, I’ll show you how to get the automatically trained Auto ML automatically deployed as well. Because that’s still something that needs to be done before the model is actually usable. Until next time, see ya!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s