How to: Copy data from a Cosmos DB to another via PowerShell

A while back I had a case with a customer where they had a web app running in production without an up-to-date testing environment. A scenario that, unfortunately, is more common than we’d like to admit. We agreed that before I started working on the desired updates for the application I would first update the testing environment so we could use that for its intended purpose. For the most part this is a pretty straightforward process, but one peculiarity with this app was that it uses Cosmos DB as its database, and the test database was completely empty. I wanted to get data from the production database to help with any testing that we would be doing.

Microsoft has a list of migration choices in their documentation for copying Cosmos DB data into a different Cosmos DB account. Since my need was for a one-off data copy, most of these options were too complex to be smart choices. Azure Data Factory could have worked, since setting it up is straightforward and fast. However, the Cosmos DB accounts in question were protected by VNets with traffic from within Azure datacenters forbidden. Which means that with Azure Data Factory I would have had to set-up a self-hosted integration runtime within the customer’s VNet or on-prem.

Yeah, no, that’s pretty complicated again.

My options were being narrowed down to either writing a command-line tool for the job, or, if possible, a script. This led me to discovering an open-source PowerShell module for CosmosDB. This looked promising, and after a few moments of playing around with it I had a fully functioning script for copying data from a Cosmos DB account to an other. In addition to the CosmosDB module linked above, running this script requires Microsoft’s own Azure PowerShell module.

The script, included below, first connects to Azure using your provided credentials. It then fetches data from a specific Cosmos DB collection’s partition using paging. The script uses 20 items per page, which can be modified depending on the size of your database items. Once all of the data from the partition has been fetched, it is then copied to the target Cosmos DB database. To copy data between Cosmos DB instances protected by VNets, run the script from a server or workstation that has network access to the databases.

Note! Since this script fetches all of the data being copied from the source database to memory before saving it to the target database, this script is not recommended for copying large amounts of data between Cosmos DB instances. In case of large-scale migrations I recommend going through the trouble of setting up Azure Data Factory with a VNet-integrated runtime.

Before using the script you need to first configure it as follows:
1. Set the name of the Cosmos DB account to copy data from on line 1
2. Set the name of the resource group containing the Cosmos DB account to copy data from on line 2.
3. Set the name of the Cosmos DB account to copy data to on line 3
4. Set the name of the resource group containing the Cosmos DB account to copy data to on line 4.
5. Set the name of the Cosmos DB database to copy on line 6.
6. Set the name of the Cosmos DB collection to copy on line 7.
7. Set the partition key value to copy data from on line 8.

$source_account = '...'     #Name of your source Cosmos DB account
$source_rg = '...'          #Name of the Azure resource group containing your source Cosmos DB account
$target_account = '...'     #Name of your target Cosmos DB account
$target_rg = '...'          #Name of the Azure resource group containing your target Cosmos DB account

$database_name = '...'      #Replace with the name of the Cosmos DB database to copy over
$collection_name = '...'    #Replace with the name of the Cosmos DB container to copy over
$partition_key = '...'      #Replace  with the name of the partition key to copy over

$query = "SELECT * FROM $collection_name"   #Cosmos DB query for selecting all data from the container

$documentsPerRequest = 20   #Number of documents read in a single Cosmos DB request
$continuationToken = $null  #Continuation token for document read operation
$documents = $null          #Collection for all read Cosmos DB documents

Connect-AzAccount   #Connect to Azure

$source_context = New-CosmosDbContext -Account $source_account -Database $database_name -ResourceGroup $source_rg   #Init Cosmos DB connection to source account
$target_context = New-CosmosDbContext -Account $target_account -Database $database_name -ResourceGroup $target_rg   #Init Cosmos DB connection to target account

#Get all items from the source container in a loop
do {
    $responseHeader = $null
    $getCosmosDbDocumentParameters = @{
        Context = $source_context
        CollectionId = $collection_name
        MaxItemCount = $documentsPerRequest
        ResponseHeader = ([ref] $responseHeader)
        PartitionKey = $partition_key
    }

    if ($continuationToken) {
        $getCosmosDbDocumentParameters.ContinuationToken = $continuationToken
    }

    $documents += Get-CosmosDbDocument @getCosmosDbDocumentParameters
    $continuationToken = Get-CosmosDbContinuationToken -ResponseHeader $responseHeader
} while (-not [System.String]::IsNullOrEmpty($continuationToken))

#Loop through the retrieved items and add them to the target Cosmos DB container
foreach ($document in $documents) {
    New-CosmosDbDocument -Context $target_context -CollectionId $collection_name -PartitionKey $partition_key -DocumentBody (ConvertTo-Json $document) -Encoding 'UTF-8'
}

Enjoy! 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s