How to: Automating Automated Machine Learning in Azure Part 2 – Deploying models and preparing training data

This post is the second part in a two-piece series on automating Automated Machine Learning. If you have not yet read the first part, I recommend checking it out first since the solution shown here will be built on top of where we left off last time.

Previously I showed you how you can automate the training of machine learning models with Azure ML’s Auto ML feature to make sure your models remain up to date with the latest training data. Now it’s time to wrap that model training process into a full-fledged solution including both model deployment and training data preparation. With this, the automated workflow not only trains a new model each time it’s run, but also installs it to production at the same time. In order to bring the whole package closer to being production ready I’ll be expanding upon the original coffee prediction solution from before with few changes:

  • In this example we will be deploying the machine learning models to Azure App Service instead of Azure Functions. There is no particular reason for choosing one over the other – you are free to do your implementation using either service – but I wanted to take the opportunity to show you another approach.
  • The master source of training data is still the same SharePoint list that was used in the original coffee prediction blog post. But since Dark Roast Ltd. has been using this list to track their coffee consumption daily, a mechanism is needed to automatically convert this data into a format that is usable by Azure Machine Learning. For this purpose, we’ll be expanding the Logic App that was previously created to include logic for preparing the training data.

In the previous part of this series, I described the process of automatically training and AutoML model as follows:

  1. An Azure Machine Learning graphical designer pipeline is run on a compute cluster
  2. The pipeline runs a custom Python script, which creates a new execution environment for our actual Auto ML training script
  3. The Auto ML training script is run on a compute cluster in its own environment
  4. The actual Auto ML training, performed by the training script, is also run on a compute cluster in its own environment

In order to get the coffee prediction training data preparation and model deployment into the same process, the expanded list of steps looks like this:

  1. A Logic App process retrieves latest training data from a SharePoint list, converts it to a CSV file and saves it to Azure Storage
  2. After preparing the training data, the Logic App triggers an Azure Machine Learning graphical designer pipeline to be run on a compute cluster
  3. The pipeline runs a custom Python script, which creates a new execution environment for our actual Auto ML training script
  4. The Auto ML training script is run on a compute cluster in its own environment
  5. The actual Auto ML training, performed by the training script, is also run on a compute cluster in its own environment
  6. Once the Auto ML training finishes, the training script registers the model in the Azure ML workspace, packages it to a Docker container, deploys the container to a container registry and triggers our App Service to retrieve the latest container version

Note! I am using a SharePoint list as the source of training data for the sake of simplicity. Your actual data source might be wildly different, and as such the actual technical implementation can vary quite a bit. Regardless, the overall process itself should remain very similar.

Let’s get started with the more fun part, which is getting the trained model deployed into an App Service!

Automatically deploying an Azure Machine Learning model into an App Service

Deploying an Azure ML model into an App Service is pretty straightforward once the whole process has been implemented. However, in order to get there, there are a few things that should be done in order first:

  1. You will need to create an initial Docker image for your model in a container registry using the training script below.
  2. Then, with the Docker image in place, you can create the App Service, configure it to use your Docker image, enable continuous deployment for the service and retrieve a webhook url which you can use to tell your App Service to update its Docker image.
  3. Finally, you can add the webhook url into your training process for a finished training script.

In the previous part of this series, I provided you with a template training script needed for running an AutoML training process. This time we’ll expand on that same script a little to include model registration, Docker deployment and triggering of the continuous deployment -process in our App Service:

os.system(f"pip install azureml-train-automl-client==1.24.0")
import pandas as pd
import requests
import azureml.core
from azureml.core import Experiment, Workspace, Run
from azureml.core.dataset import Dataset
from azureml.core.compute import ComputeTarget
from azureml.core.authentication import MsiAuthentication
from azureml.train.automl import AutoMLConfig
from azureml.core.model import Model
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig
from azureml.core.conda_dependencies import CondaDependencies

#Azure Machine Learning workspace settings
subscription_id = '…'
resource_group = '…'
workspace_name = '…'
msi_identity_config = {"client_id": "…"}

#AutoML run settings
dataset_name = '…'
dataset_label_column = '…'
experiment_name = '…'
compute_name = '…'

#Model registration and Docker container publish settings
model_name = '…'
docker_image_name = '…'
docker_image_label = 'latest'
docker_webhook_url = '…'

#Retrieve the Azure ML Workspace
msi_auth = MsiAuthentication(identity_config=msi_identity_config)
ws = Workspace(subscription_id=subscription_id,
               resource_group=resource_group,
               workspace_name=workspace_name,
               auth=msi_auth)

#Retrieve dataset and compute for the AutoML run
dataset = Dataset.get_by_name(workspace=ws, name=dataset_name)
compute_target = ws.compute_targets[compute_name]

automl_config = AutoMLConfig(task='regression',
                             experiment_timeout_minutes=30,
                             primary_metric='normalized_root_mean_squared_error',
                             training_data=dataset,
							 compute_target=compute_target,
                             label_column_name=dataset_label_column)

#Execute the AutoML run
experiment = Experiment(ws, experiment_name)
run = experiment.submit(automl_config, show_output=True)
run.wait_for_completion()

#Get the best model from the AutoML run and register it
best_run = run.get_best_child()
best_run.download_files(prefix='outputs', append_prefix=False)
model = Model.register(model_path='outputs/model.pkl',
                       model_name=model_name,
                       workspace=ws)

#Prepare an environment for the model
myenv = Environment.from_conda_specification(name='project_environment', file_path='outputs/conda_env_v_1_0_0.yml')
myenv.docker.enabled = True
inference_config = InferenceConfig(entry_script='outputs/scoring_file_v_1_0_0.py', environment=myenv)

#Create Docker container for the model
package = Model.package(ws, [model], inference_config,
					   image_name=docker_image_name,
					   image_label=docker_image_label)
package.wait_for_creation(show_output=True)

#Update web app with the latest container
requests.post(docker_webhook_url)

You can also download the template training script from my Github repo!

The modifications you need to do to this template are as follows:

  1. On lines 17, 18 and 19 provide your Azure tenant id, the name of the resource group your Azure ML workspace is in and the name of the Azure ML workspace itself.
  2. On line 20 paste the client ID of your user assigned managed identity which you retrieved in the previous part of this series.
  3. On lines 23 and 24 give the name of the dataset you are using for training data, and the name of the column which contains the values you are looking forward to predicting.
  4. On lines 25 and 26 give the name of the Azure ML experiment the Auto ML training is created under, and the name of your compute cluster.
  5. On lines 29 and 30 give the name for the Azure ML model you will use when registering the trained model and the name for your Docker image. Leave the Docker image label (line 31) as-is, and we’ll add the webhook URL (line 32) later once the App Service has been configured.
  6. You can do further modifications by changing the parameters of the AutoMLConfig-object’s construction. Check the SDK documentation for all available options.

Once you are done, save the script as “automltrainer.py” and compress it to a zip file. Now that you’ve got the training script zipped, the next step is to update your training script file dataset into your Machine Learning workspace that you created in the previous part of this series:

  1. Navigate to the “Datasets” -page and click on the dataset you previously created
     
  1. Click “New version” and “From local files”
     
  1. Click “Next”, then “Browse” and select the zip file containing your training script. Click “Next” and then “Create.”

With the training script updated, you can now manually trigger the training pipeline which you created during the previous part of this series. The pipeline will use this new updated script automatically (assuming you have set the training script dataset to always use its latest version in the pipeline!), and with this new updated version once the training is complete the resulting model will be registered into your Azure Machine Learning workspace and then uploaded into your container registry as a Docker image. Run your training pipeline and then wait for it to entirely complete before moving on to the next step:

  1. In Azure ML open the “Designer” page and click on your training pipeline

     
  1. Click “Submit” and use your previous experiment for executing the pipeline
  2. Wait a good while (maybe have a coffee break here, too?) 🙂

Creating an App Service for Azure Machine Learning continuous deployment

With the first Docker image created and uploaded to the container registry we can now go ahead and create the App Service and configure it for continuous deployment with Azure ML:

  1. Use Azure Portal to create a new Web App resource. Select “Docker Container” as your publish-option and Linux as the operating system for the service. After filling in the rest of the options, click “Next” to configure Docker.
     
  1. On the next page, select “Azure Container Registry” as your image source and then select the registry that is associated with your Machine Learning workspace. From the “Image” drop-down pick the option you set as the name of your Docker image in the training script (line 30). The “Tag” value should be “latest.” Then go ahead to create the Web App resource.
     
  1. Once the Web App has been provisioned, open it in Azure Portal and go to Deployment Center. In there, enable Continuous deployment and click “Save.” Finally, copy the webhook URL value.
     

Note! If you have previously used Azure Functions as the deployment target for Azure Machine Learning models and are now trying out App Service for the first time, it’s important to note that the URL used to call the model is different in App Services. With App Services the URL to send your HTTP POST requests to is https://WEB-APP-NAME.azurewebsites.net/score.

With this your App Service has been created and configured to use your machine learning Docker image. The service is now also enabled for continuous deployment, so all that’s left to do for this part is to update our training script.

Enabling continuous deployment in the Azure Machine Learning training script

This step is quite easy: Open up the training script you modified previously. On line 32 there’s a variable “docker_webhook_url” – paste the webhook URL you copied a minute ago here. Save the script, add it to a zip archive and upload this new version to the Azure ML training script dataset following the same steps as above.

With this the first half of this post is done – now we move on to preparing the training data.

Expanding the timer Logic App to export the training data from SharePoint to Azure Storage

In the previous part of this series, I showed how to create a very simple timer Logic App which triggers the training pipeline once every week. Now we will expand on that same Logic App to add functionality for preparing the training data from SharePoint. But first, before you can get your Logic App to connect into SharePoint you need to give your Logic App’s managed identity read permissions to SharePoint. For this you need to retrieve the Object ID of the managed identity resource which you previously created:

You also need to install the Azure Active Directory PowerShell for Graph module if you haven’t done so already. Giving permissions to connect to SharePoint to the managed identity can currently be only done via PowerShell using this module. You will also need the ID of your Azure AD tenant. With the Tenant and Object IDs retrieved and the PowerShell module installed, open PowerShell and run the following lines one-by-one:

$tenantId = 'YOUR-TENANT-ID'
$objectId = 'YOUR-OBJECT-ID'

Connect-AzureAD -TenantId $tenantId
$app = Get-AzureADServicePrincipal -Filter "DisplayName eq 'Office 365 SharePoint Online'"
$perm = "Sites.Read.All"
$role = $app.AppRoles | Where-Object {$_.Value -eq $perm -and $_.AllowedMemberTypes -contains "Application"}
New-AzureAdServiceAppRoleAssignment -ObjectId $objectId -PrincipalId $objectId -ResourceId $app.ObjectId -Id $role.Id

With the permissions assigned you can now move on to implementing the Logic App itself. The complete Logic App will look like this once we are done:

So, let’s get started! I’ll explain the implementation piece by piece, but you can also get a copy of my Logic App in JSON from my Github repo and use that as basis for your own implementation. The template is the same as the workflow shown above, except with some variables and the “Create blob” -action from the end removed. These need to be added in manually if you wish to use the template.

The expanded Logic App starts with initializing a set of variables:

  • SiteUrl, which is a string, and the value is a URL of the SharePoint site containing the SharePoint list which contains your training data
  • ListGuid, which is a string, and the value is a Guid of the list which contains your training data
  • Columns, which is a string, and the value is an OData select-statement listing all of the list columns you want to retrieve using their internal names. For example, in my Logic App the Columns-value is: “?$select=Title,Time_x0020_of_x0020_day,Number_x0020_of_x0020_Finnish_x0,Number_x0020_of_x0020_British_x0,
    Number_x0020_of_x0020_other_x002,Cups_x0020_of_x0020_coffee_x0020
  • RequestUrl, which is a string containing the following expression: “@{variables(‘SiteUrl’)}/_api/web/lists(guid’@{variables(‘ListGuid’)}’)/items@{variables(‘Columns’)}
  • Items, which is an Array

Next in the Logic App is an Until-loop which investigates the length of the RequestUrl-variable with the following expression: “@less(length(variables(‘RequestUrl’)), 1)” The loop terminates once the length of the variable is less than 1. Inside the loop the Logic App retrieves items from the SharePoint list using a HTTP request, maps the results into an array and then appends the Items-variable with the new items. Finally, if there are more than 100 items in the SharePoint list (and there are!), the item set retrieved from SharePoint contains an URL for retrieving the next 100 items. That URL is assigned to the RequestUrl-variable, and this loop is iterated until all of the items are retrieved from SharePoint (which we can tell has happened once SharePoint no longer returns a new URL for getting the next set of results).

Things of note here are:

  1. The HTTP request uses the GET-method and RequestUrl-variable for the URI. You need to provide the Accept-header with value “application/json;odata=verbose“. You also need to set authentication type to Managed identity, select your user assigned identity and set the audience to the domain of your SharePoint-tenant (ie. https://xyz.sharepoint.com)
  2. Use the Select-action (under Data Operations) to map SharePoint list items to array items. This action will rename the list item columns to how they will look like in the final CSV file. Use the expression “@body(‘Get_items’)[‘d’][‘results’]” for the From-field and then use expressions like the following in the right column of the Map-table: “@item()[‘Number_x0020_of_x0020_British_x0’]” Note that with SharePoint you need to use the internal column names instead of display names!
  3. Use the Compose-action (also under Data Operations) to add the new items and all previously retrieved items to a temporary array with the following Input-expression: “@union(variables(‘Items’), body(‘Map_list_items_to_arrays’))
  4. Use the Set variable-action to set the output of the Compose-action to the Items-variable
  5. Use the Set variable-action with the following expression to retrieve the next URL from the results of the HTTP request: “@{if(contains(body(‘Get_items’)[‘d’],’__next’), body(‘Get_items’)[‘d’][‘__next’], ”)}

Below is what these actions look like in my Logic App:

What’s left for the Logic App is to convert the array of items into CSV and then save that to Azure Storage. For that, add a Create CSV table-action (under Data Operations) and set its From-field to the Items-variable. Then, add a Create blob-action (under Azure Blob Storage). For this action you will need to create a connection to the Azure Storage instance which your training dataset is contained in. Configure the action to overwrite the dataset csv file and assign the output of the Create CSV table-action as the blob contents. The image below shows how this looks like for me:

And that’s it! Now when the Logic App triggers the next time it will generate a new file containing the latest training data before triggering the Azure Machine Learning pipeline. And the training script used by the pipeline will register the best model generated by Auto ML, deploy it to Docker container registry and tell your App Service to retrieve the latest container image.

What’s next?

Quite frankly, not much! This two-part series has shown how to build an automated training and deployment workflow for a very simple AutoML case. Your process for preparing the training data may look different if you are using something else than SharePoint lists as your data source – especially if you are using multiple data sources and combining them into one! Additionally, in this example I was training a regression model, and if you are looking to use classification or forecasting instead, you’ll want to modify the training script appropriately. But even then, the examples here should get you started. 🙂

With that said, until next time!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s