How to: Building robust and cost-effective Azure SSIS runtime automation in Data Factory

Azure Data Factory is the de facto service to use when you want to orchestrate data movement and transformation in Azure. From a design and architecture point of view, my favourite thing about Data Factory is that it lets you do ETL – or ELT – in many different ways, leaving it up to you to choose the most appropriate solution for your situation. The best way to think about Data Factory is as a two-layer service: On the top is the orchestration layer, with pipelines, triggers and alerts managing and, well, orchestrating your processes. And below that, at the heart of Data Factory are its runtimes, a selection of several different environments where the actual data movement and wrangling magic happens. The choice of runtimes determines what kinds of tools are available to you, how much Data Factory is going to cost and what data stores you can connect to. As such, your overall technical situation (where your data rests, what you need to do with the data, and so on) often necessitates your selection of Data Factory runtimes.

One of these Data Factory runtimes is the Azure SSIS-IR, or Azure SQL Server Integration Services Integration Runtime. By today’s standards SSIS is old technology, but in the ever-changing world of the cloud it also has something very respectable to offer: Technical maturity. As such, SSIS is more feature rich than Data Factory’s alternative Data Flows are, meaning that you can use SSIS to implement more complicated ETL scenarios than what the other code-free alternatives in Data Factory support. And, of course, you can also use SSIS’s code tasks to add code to perform even more powerful data transformations. Not only that but using SSIS lets you use old skills in the cloud, which makes it an easy entry point for on-prem data engineers to transition into Azure. And I haven’t even mentioned Azure SSIS’s support for VNETs or the ability to install custom ODBC drivers on your SSIS virtual machines to connect to a wide array of different data sources!

Note! Data Factory currently has a feature for Managed Virtual Networks in preview, which enables you to use Data Flows with data sources protected with a VNET. However, since this is still a preview feature, at the time of writing Azure SSIS is the only production-ready Data Factory runtime for integrating with VNETs.

There is a downside to Azure SSIS, though: It’s fairly expensive. At the time of writing the cheapest virtual machine option, a mere single core D1V2 costs 364 € / month in the West Europe data center if it’s run 24/7. I’m pretty sure no one wants to pay that much just for SSIS, and fortunately, they don’t have to, because by automating starting and stopping the virtual machine based on when it’s actually needed will let you cut those costs significantly. For example, if your SSIS virtual machine is needed for only an hour each day, a properly automated D1V2 VM would now cost only a little over 15 € / month!

Why don’t we take a look at how that automation can be done – in a manner that’s also robust and capable of recovering from Azure’s transient server errors?

Note! Most of the Azure SSIS pricing comes from SSIS license fees. If your organization has applicable Software Assurance licenses you can use them to get anything from 33 % to 88 % off the SSIS integration runtime prices. Read more on Microsoft’s FAQ.

Elements of Azure SSIS runtime automation

Automating the starting and stopping of an Azure SSIS runtime is built around Data Factory’s management APIs, which can be used either via pure HTTP requests or by using Azure PowerShell commandlets. There are various sources on the internet for how to perform the actual runtime starting and stopping actions from Data Factory, for example How to schedule Azure SSIS integration runtime by Microsoft demonstrates using Azure Automation for running a PowerShell script which is then called from a Data Factory pipeline. This is a good article for setting up a Data Factory environment that works most of the time for SSIS – but it could be better. Mainly:

  1. There are occasionally transient errors that occur when starting an Azure SSIS runtime. Azure does not return error messages on these when you first request for the runtime to be started, so the only way to catch these errors is by repeatedly checking the runtime status and performing recovery actions if the status is of an unexpected value.
  2. In the example above the PowerShell script is scheduled in Azure Automation. It is better to perform this scheduling in Data Factory to minimize the time the runtime is idling for more optimized savings.

The PowerShell script provided by Microsoft in the documentation above performs starting and stopping of the runtime, which is a good starting point, but we also need a third piece of functionality: Retrieving the status of the runtime. I’ve modified the script below, adding a third operation, “STATUS”, for using the commandlet Get-AzDataFactoryV2IntegrationRuntime and returning status information to Data Factory. In order to support returning of values from Azure Automation I have also added an extra parameter: WebhookData, and then retrieving the variable callBackUri from it if it is present. Data Factory passes this variable to web hooks when using the Web Hook -activity to provide an URL which the called service (in this case, Azure Automation) can call back with return values (in this case, the status of the SSIS runtime) once the operation finishes. You can find more on the Web hook -activity on Microsoft’s documentation.

Param
(
      [Parameter (Mandatory= $true)]
      [String] $ResourceGroupName,
      [Parameter (Mandatory= $true)]
      [String] $DataFactoryName,
      [Parameter (Mandatory= $true)]
      [String] $AzureSSISName,
      [Parameter (Mandatory= $true)]
      [String] $Operation,
      [Parameter (Mandatory = $false)]
      [object] $WebhookData
)
$connectionName = "AzureRunAsConnection"
try
{
    # Get the connection "AzureRunAsConnection "
    $servicePrincipalConnection=Get-AutomationConnection -Name $connectionName         
    "Logging in to Azure..."
    Connect-AzAccount `
        -ServicePrincipal `
        -TenantId $servicePrincipalConnection.TenantId `
        -ApplicationId $servicePrincipalConnection.ApplicationId `
        -CertificateThumbprint $servicePrincipalConnection.CertificateThumbprint 
}
catch {
    if (!$servicePrincipalConnection)
    {
        $ErrorMessage = "Connection $connectionName not found."
        throw $ErrorMessage
    } else{
        Write-Error -Message $_.Exception
        throw $_.Exception
    }
}
if($WebhookData){
	$parameters=(ConvertFrom-Json -InputObject $WebhookData.RequestBody) 
	if($parameters.callBackUri) {
		$callBackUri = $parameters.callBackUri
	}
}
if($Operation -eq "START" -or $operation -eq "start")
{
    "##### Starting #####"
    Start-AzDataFactoryV2IntegrationRuntime -ResourceGroupName $ResourceGroupName -DataFactoryName $DataFactoryName -Name $AzureSSISName -Force
}
elseif($Operation -eq "STOP" -or $operation -eq "stop")
{
    "##### Stopping #####"
    Stop-AzDataFactoryV2IntegrationRuntime -DataFactoryName $DataFactoryName -Name $AzureSSISName -ResourceGroupName $ResourceGroupName -Force
}  
elseif($Operation -eq "STATUS" -or $operation -eq "status")
{
    "##### Getting status #####"
    $statusInfo = Get-AzDataFactoryV2IntegrationRuntime -DataFactoryName $DataFactoryName -Name $AzureSSISName -ResourceGroupName $ResourceGroupName -Status
	
	if($callBackUri){
		Invoke-WebRequest -Uri $callBackUri -Method POST -Body ($output|ConvertTo-Json) -ContentType "application/json"
	}
}  
"##### Completed #####" 

Using Azure Automation is just one option among many, and ultimately it doesn’t matter too much how you call the Data Factory APIs as long as your implementation works and is secure. But with the information above you’ll be able to set up your own Automation Runbook, and all that’s left will be orchestrating the whole thing in Data Factory. And that’s also the important part. Before you start with the orchestration it’s good to outline the requirements: When do you need to run your SSIS packages? Are they run every day or only on workdays? Every hour, once a day, or maybe at set times in the morning and more often in the afternoon? Also, are you using a VNET or not? Starting up an SSIS integration runtime without a VNET takes roughly 5 minutes, but if you are using a VNET it can take anything from 30 to 40 minutes, and this is something you have to take into account!

Once you’ve got your requirements listed, we can move on to:

Orchestrating Azure SSIS runtime automation

In the example case I’m going to provide the SSIS packages are being run only on workdays, and once every hour from UTC 2 AM to 6 AM, and every ten minutes from UTC 7 AM to UTC 3 PM. After UTC 3 PM the packages will not be executed at all until 2 AM on the next workday. This is already a slightly more complicated scenario, since it requires us to take the weekday into account and be able to support two different execution frequencies in one solution. Ultimately there are multiple ways to go around implementing this, but in this example, I opted for a solution that’s as easy to understand as possible: I created two tumbling window triggers, once that’s executed every 60 minutes and another that executes every 10 minutes. These both then have one pipeline associated with them which checks the current run time and performs orchestration accordingly. Here’s the pipeline that’s triggered every 60 minutes:

These If Conditions perform the following tasks:

  • If UTC hour >= 2 and <= 6 and not weekend -> Start SSIS runtime, execute SSIS package, stop SSIS runtime
  • If UTC hour == 7 and not weekend -> Start SSIS runtime, execute SSIS package
  • If UTC hour == 15 and not weekend -> Execute SSIS package, stop SSIS runtime

Below is an example of what the activities inside the first If Condition might look like:

SISS 
.euuny SISS 
SISS

In this example situation the SSIS runtime is associated with a VNET, which means that starting it is going to take at least 30 minutes. Because of this the runtime is not stopped between UTC 7 AM and 3 PM, when the package is being run every 10 minutes. If a VNET integration was not required then it would be possible to start and stop the runtime between each of those executions, provided your SSIS package execution itself is quite fast.

What comes next is the meat of this implementation: Starting the SSIS runtime. If you checked the Microsoft documentation I linked above, in theory this is going to be very simple. But if you want an implementation that can handle random transient error situations you need to do a little bit more. The Data Factory APIs don’t have any webhooks or events that you could subscribe to, so the only way to get information on the status of the runtime is by polling the status endpoints. Which is what I’m going to do after starting the runtime: Most of the time the runtime will eventually change from “Starting” to “Started”, but occasionally the operation fails, and the status returns to being “Stopped.” I want to catch these situations and then try again for a few times. Also, by polling for the status changes my pipeline execution can move on sooner than if I just waited for, say, 45 minutes every single time. Here’s my implementation as pseudo code:

Get Current SSIS_Status
Set Retry_Count to 0
WHILE
	SSIS_Status IS NOT 'Started'
	AND Retry_Count LESS THAN 3
THEN
	Start SSIS Runtime
	WHILE
		SSIS_Status IS 'Starting'
	WAIT 45 seconds
		AND Get Current SSIS_Status
	AFTERWARDS
		Increment Retry_Count
		Get Current SSIS_Status

There are couple of things to note here when it comes to Data Factory. The very first one is that in Data Factory you cannot have nested loops in a single pipeline – which is exactly what my pseudo code does have! In situations like this you need to split your pipeline into multiple pipelines, each containing one loop, and then calling the nested pipeline from the parent loop. Here’s a version of the same pseudo code that’s modified for Data Factory pipelines:

Pipeline 'SSIS Starter Retry Loop'
	Get Current SSIS_Status
	Set Retry_Count to 0
	WHILE
		SSIS_Status IS NOT 'Started'
		AND Retry_Count LESS THAN 3
	THEN
		Execute Pipeline 'SSIS Starter'
		AFTERWARDS
			Increment Retry_Count
			Get Current SSIS_Status
			
Pipeline 'SSIS Starter'
	Start SSIS Runtime
	WHILE
		SSIS_Status IS 'Starting'
	WAIT 45 seconds
		AND Get Current SSIS_Status

Second thing to be aware of is the difference between Data Factory’s Web and WebHook activities when calling Azure Automation runbooks. If you use a Web activity the pipeline execution will continue as soon as the service you called returns any HTTP response. In the case of Azure Automation this will always be a message that acknowledges that you have requested a runbook to be queued for execution. This is fine for us when we want to either start or stop the runtime, but when we are calling a runbook to get the runtime’s status, we need to get the return values from the script as well! The WebHook activity lets us do this, since it waits for the requested service (in this case, the runbook) to call a callback URL which Data Factory generates for the WebHook activity. This mechanism allows us to get the SSIS runtime’s status back from Azure Automation.

Implementing the Data Factory pipelines

With all of that said, all that’s left is implementing the pipelines. In my example case I’ve got the following list of pipelines in my Data Factory:

12 
Pipelines 
SSIS Runner 
SSIS Starter 
SSIS Starter Retry Loop 
SSIS Stopper 
SSIS Timer Every 10 minutes 
(E SSIS Timer Every 60 minutes

You’ve already seen the pipeline that executes every 60 minutes, and the 10-minute pipeline is very similar. SSIS Runner and SSIS Stopper are very simple as well: The runner pipeline simple executes an SSIS package, and the stopper calls the Azure Automation runbook to stop the SSIS integration runtime. It’s the two SSIS Starter pipelines that are a bit more complicated, but fortunately you’ve already seen the pseudo code for them. Here’s what the SSIS Starter Retry Loop looks like:

WebHook 
Get SSIS Status Before 
Ste rt 
until 
Set variable 
Start SS'S Retry Loop 
Set SSIS Status Before 
(x) 
Ste rt 
Activities 
5 activities

The Until loop above has two conditions: It checks that the retry count is less than 3, and that the SSIS Status is not “Started.” Inside of the loop looks like this:

Execute Pipeline 
Execute SS'S Ste rter 
WebHook 
Set variable 
Set Temp Variable to 
(x) 
RetryCount 
Set variable 
Set variable 
(X) Increment Retry Count 
Get SSIS Status After 
Starter 
Set SSIS Status After 
(x) 
Starter

You’ll see that inside the loop I’m calling the SSIS Starter pipeline because of the aforementioned nested loops. You might also have noticed that I’m having to use three variables: “SSIS Status”, “Retry Count” and “Temp Variable.” That’s because in Data Factory’s Set variable -activity you cannot modify a variable that’s being used in the activity’s expression, in other words, an activity that goes “Retry Count = Retry Count + 1” is illegal. Instead, what you have to do is this: “Temp Variable = Retry Count. Retry Count = Temp Variable + 1.” It’s a bit of a hassle but works just fine.

Finally, the SSIS Starter pipeline. Here the Until loop is waiting for the SSIS Status to be something else than “Starting”:

Start SSIS 
Set variable 
Set SSIS Status to 
(x) 
Starting 
until 
Wait for SSIS 
Activities 
3 activities 
Wait 
Wait 2 min

And again, inside the Until loop:

Wait 
Wait 45 seconds 
WebHook 
Get SSIS Status 
Set variable 
(X) Set SS'S Status

That’s pretty much it. I’ve opted to leave out the actual Data Factory expressions because this post is quite lengthy as is, but you can check out the pipelines’ json definitions over at my GitHub repo. Technically there’s a non-zero chance that something might go wrong with stopping the SSIS runtime as well, so for an absolutely rock-solid implementation you would create a similar retry loop structure for the SSIS Stopper pipeline as well. But for now, that’s enough from me, stay tuned and happy data wrangling!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s