Machine learning can be used to solve different kinds of problems. In my previous post Using Azure Machine Learning and SharePoint to predict coffee consumption I presented a case where the aim was to predict numbers. Machine learning models used for these kinds of problems are called regression models. This time I will take a look at another common case: Classification. Whereas regression models will give predictions akin to “I think the result is about this much”, classification’s predictions will be similar to “I’m quite certain the answer, out of the options X, Y or Z, will be Y.” But what does this mean in practice? For that we will go back to our old friends at Dark Roast Ltd. and take a look at their hugely popular cat owners’ association.
Through thick and thin cat hair
At Dark Roast Ltd. the very first employee hobby club ever was their cat owners’ association. This was mostly because nearly everyone in the company was – and still is – a cat owner, so it was quite natural to start a club on such common ground. One of the key components of the cat club is their Cat-a-log, a registry of all cats owned by members of the club. To date, there are over 700 cats in the Cat-a-log, which is quite the number when you consider Dark Roast Ltd. having only around 150 employees! For each cat they entered the cat’s name, its hair length in centimeters, the thickness of its hair from 1 to 5, the cat’s sassiness from 1 to 10 and its favourite hairbrush, which is either the Purrrminator, the Max Smoothener, the Mr. Whisker’s Zoomergroomer or the Gentlycomb.
While sharing cat stories the people at Dark Roast Ltd. came to realize that some way of predicting what brush works best for their new cats would both save money and prevent being clawed by annoyed fur balls. Having had some success with machine learning from before they decided to see and try if the data in their Cat-a-log could be utilized to train a model that could help with this. So once more they extracted their data into a CSV file (I’ve added a copy of the data as an Excel file below so you can play around with it yourself) and then went to Azure Machine Learning to train a classification model in Auto ML.
Of course, simply having a working machine learning model isn’t useful by itself, so at Dark Roast Ltd. they also had to figure out a way to utilize these predictions. Since the Cat-a-log was an already existing solution for them they decided that integrating the model with it would make a lot of sense. In the end they decided on creating a Logic App workflow that triggers whenever a new cat is added to the Cat-a-log (Power Automate would have worked just as well, but it would have required a Premium license to use!). For the purposes of the workflow, they added a new option for the Favourite Brush -column: “I don’t know, tell me what to try!” which would tell the Logic App to get predictions to suggest.
Now let’s see what it takes to make that specification reality!
Training a classification model in Auto ML
At Dark Roast Ltd. they decided on using Azure Machine Learning’s Auto ML (or Automated Machine Learning) feature for training their favourite cat hairbrush -classification model since it simplifies many tasks involved in typical model training projects. With Auto ML their software developers could train models without necessarily requiring any data science background – especially in the case of a very simple model such as the classification model they were creating this time. Having understanding of data science and machine learning algorithms is still valuable even in Auto ML projects when it comes to knowing how the models work and estimating whether the model’s predictions are actually any good – but it’s not a requirement for getting started!
If you haven’t used Azure Machine Learning before, you can find it from the Azure portal from the Create a resource -blade. Besides the usual settings of a resource group, name and region you need to associate the Machine Learning resource with four other Azure-resources: A storage account, a key vault, an application insights instance and a container registry. You can use the Machine Learning creation page to create these resources. Once you’re done with creating the Machine Learning resource, wait for it to be provisioned and then open up the resource’s blade. Click the “Launch studio” button to open the Azure Machine Learning interface.
Once you have Azure Machine Learning open, you can get started with Automated Machine Learning by clicking the “Start now” button under Automated ML, and then click “New Automated ML run.” When creating an Auto ML run there are three things that need to be defined: The dataset containing data used in the training process (in this case it will be the Excel file above, converted into a CSV), the run itself and a compute cluster, which is a virtual machine cluster in Azure that is going to be used for performing the machine learning run. The steps listed below contain the absolute minimum needed to get an Auto ML project running with the cat hairbrush data – there are lots of options with Auto ML that allow you to configure the runs to be more optimal, but I’m opting to skip those for brevity.
If you are already familiar with Auto ML feel free to skip to “Integrating machine learning with Logic Apps”, but for the rest of you, once you’ve clicked the “New Automated ML run”-button, let’s get started with the dataset.
Creating a new dataset
- Click “Create dataset” and select “From local files”
- Give the dataset a name, such as “cat-hair”, and click Next
- Click “Browse” and select the csv file containing data for the dataset, and click Next
- From the “Column headers” dropdown select “Use headers from the first file” and click Next
- Deselect the Name-column by clicking on the toggle control in the Include-column. We can be fairly sure that the cats’ name is not relevant in predicting their favourite brushes, so we’ll leave it out! Click Next, verify the summary is ok and save the dataset
Configuring the Auto ML run
- Now that you have created the dataset, select it by clicking the checkbox next to its name and click Next
- Give your Auto ML run (or experiment) a name and select a Target column. The target column is the column in the data which you will be trying to get predictions for. In this example, it’s “Brush”
- Select a compute cluster for your run. If you haven’t created one before, click “Create a new compute” and follow the steps in the “Creating a new compute cluster” section below. Finally, click Next
- Verify that Classification is the selected task type. Note that there are two links below the task types for further configuring the run: “View additional configuration settings” and “View featurization settings.” We’ll be skipping those in this blog, but just be mindful that they are there, and they are important
- Finally, click Finish to get the run started
Creating a new compute cluster
- After clicking “Create a new compute” a new blade opens up for configuring your compute cluster. The first choice to make is between a dedicated and a low priority cluster. I recommend low priority clusters for personal testing since they are significantly cheaper, but they are a poor choice for production environments since Azure can pre-emptively shut them down if a dedicated customer needs those resources. Let’s go with low priority in this case
- Next is a choice between CPU-only virtual machines or ones with GPUs included. The latter choice is relevant for image recognition and neural networks, but for us CPU is more than fine
- Then it is time to select a virtual machine size. Something modest, like Standard_D2_v3, will be great here. Click Next
- Give your compute cluster a name
- You can also change the minimum and maximum number of nodes assigned to the cluster. It’s best to leave the minimum at 0 so you won’t get running costs even when the cluster is idle, and a maximum of 1 is enough for testing. But you might want to increase the maximum to speed up your machine learning runs
- Finally, verify that the idle seconds before scale down is a reasonable number (such as 120 seconds) and click Finish to create the compute cluster
Once you have got the Auto ML run created and running it will take a while for it to complete. Feel free to take a break here and come back a little while later once the run has completed.
Integrating machine learning with Logic Apps
The very first thing to do before the model can be used with Logic Apps is to deploy it somewhere. Since Dark Roast Ltd. already had an existing Linux App Plan in their Azure subscriptions which they used for deploying their previous Coffee Prediction model it was an easy choice to re-use the same App Plan with a new Function App. With this way of deployment, they wouldn’t incur any new costs, since the new model would simply share the same compute resources the previous model. For an in-depth look at the practical steps for how this deployment is done, see How to: Easily deploying Azure Machine Learning models to Azure Functions.
Once the model is deployed it’s time to create the Logic App itself. If you are new to Logic Apps, I recommend getting started with Microsoft’s free online material such as Introduction to Azure Logic Apps. The Cat-a-log Logic App begins with the “When an item is created” SharePoint connector which is configured to monitor the Cat-a-log SharePoint list. The next action is a condition which checks if the added cat’s Favourite Brush -value is “I don’t know, tell me what to try!” If the condition is true, the cat’s favourite brush will be predicted and the results of the prediction will be e-mailed to the person who added the cat to the Cat-a-log along with a generic “Thank you for registering your cat!”-message. On the other hand, if the condition is false, then a generic e-mail without any predictions will be sent to the cat owner.
Getting the predictions in Logic App from a machine learning model that is deployed as an Azure Function is done easily via the HTTP action. At the minimum it takes three items to set up an HTTP action for machine learning models: The URI where the model is deployed to, the function’s code which we’ll use for authentication and the parameters which we’ll pass into the model for making predictions upon. If you look carefully at the HTTP request’s body below, you’ll notice the data-variable in the JSON message is actually an array. So, yes, you could add multiple sets of parameters into one HTTP request to perform batch predictions as well! For now, getting predictions one-by-one is a good approach for Dark Roast Ltd., but who knows if some day they’ll be getting hundreds of brush predictions a day? Having an option for performing predictions in batches will be useful in such an occasion.
Note! Including your Function App’s code like this straight into Logic App in plain text is a naïve approach that I’ve opted to use for demonstration purposes only, and it’s not something I recommend for production deployments. As more robust and secure alternatives I recommend using Azure AD and managed identities for authentication – or at the very least putting your code into a Key Vault!
Results from the deployed model are returned as JSON strings, so in Logic Apps expressions are needed to access these predictions. In the case of the Cat-a-Log Logic App, the expression needed to get the cat hairbrush prediction is:
If Dark Roast Ltd. had opted to use batch predictions, where there are multiple sets of parameters sent to the machine learning model at once, the results would be returned in an array in the same order as the parameters were passed. In our current scenario there was only one set of parameters so we are able to hardcode the final array index 0 in the expression above, but in a batch scenario you would loop through this array to get to the other results. With this, the e-mail sending action looks like this:
And the resulting e-mail message once someone at Dark Roast Ltd. adds a new cat into the Cat-a-log will look like this:
So that’s all it takes to get an Azure Machine Learning model integrated with Logic Apps (or Power Automate)! Pretty simple, huh? Of course, the guys at Dark Roast Ltd. were not entirely satisfied with just this, and they already started planning on some additional developments: How about adding links in the e-mail message for confirming whether the prediction was correct or not, and then automatically re-training the model based on that input? Well, turns out they got pretty swamped with customer projects and haven’t had a chance to do that just yet, but maybe some time in the future?
Until next time, see ya!