Migration Tool for Azure Cosmos DB for MongoDB (vCore-Based)

Streamline your migration to Azure Cosmos DB for MongoDB (vCore‑based) with a reliable, easy‑to‑use web app. Choose your migration tool and migration mode (offline or online). The app orchestrates bulk copy and, for online jobs, change‑stream catch‑up. You don’t need to learn or use these command-line tools yourself.

Migration tool

MongoDump and MongoRestore: Uses mongo-tools. Best for moving from self-hosted or Atlas.
MongoDB Driver: Uses the driver for bulk copy and change stream catch-up.
MongoDB Driver (Cosmos DB RU read optimized): Optimized for Cosmos DB Mongo RU as source. Uses RU-friendly read patterns.

Migration mode

Offline: Snapshot-only copy. Job completes automatically when copy finishes.
Online: Copies bulk data, then processes change streams to catch up. Requires manual Cut Over to finish.

Key Features
Azure Deployment
On-Premises Deployment
- Steps to Deploy on a Windows Server
How to Use
Job options and behaviors
Collections input formats
- CollectionInfoFormat JSON Format
How To Guide
Settings (gear icon)
Job lifecycle controls in Job Viewer
RU-optimized copy (Cosmos DB source)
Security and data handling
Logs, backup, and recovery
Performance tips
Troubleshooting

Key Features

Flexible Migration Options
Supports both online and offline migrations to suit your business requirements. It supports multiple tools to read data from the source and write it to the target. If you are migrating from on-premises MongoDB VM, consider using the On-Premises Deployment to install the app locally and transfer data to Azure Cosmos DB for MongoDB (vCore‑based). This eliminates the need to set up an Azure VPN.
User-Friendly Interface
No steep learning curve—simply provide your connection strings and specify the collections to migrate.
Automatic Resume
Migration resumes automatically in case of connection loss, ensuring uninterrupted reliability.
Private Deployment
Deploy the tool within your private virtual network (VNet) for enhanced security. (Update main.bicep for VNet configuration.)
Standalone Solution
Operates independently, with no dependencies on other Azure resources.
Scalable Performance
Select your Azure Web App pricing plan based on performance requirements:
- Default: B1
- Recommended for large workloads: Premium v3 P2V3 (Update main.bicep accordingly.)
Customizable
Modify the provided C# code to suit your specific use cases.

Effortlessly migrate your MongoDB collections while maintaining control, security, and scalability. Begin your migration today and unlock the full potential of Azure Cosmos DB!

Azure Deployment

Follow these steps to migrate data from a cloud-based MongoDB VM or MongoDB Atlas. You can deploy the utility either by building it from the source files or using the precompiled binaries.

Prerequisites

Azure Subscription
Azure CLI Installed
PowerShell

Deploy on Azure using Source Files (option 1)

This option involves cloning the repository and building the C# project source files locally on a Windows machine. If you’re not comfortable working with code, consider using Option 2 below.

Install .NET SDK
Clone/Download the repository: https://github.com/AzureCosmosDB/MongoMigrationWebBasedUtility
Open PowerShell.
Navigate to the cloned project folder.

Run the following commands in PowerShell:

# Variables to be updated
$resourceGroupName = <Replace with Existing Resource Group Name>
$webAppName = <Replace with Web App Name>
$projectFolderPath = <Replace with path to cloned repo on local>


# Paths - No changes required
$projectFilePath = "$projectFolderPath\MongoMigrationWebApp\MongoMigrationWebApp.csproj"
$publishFolder = "$projectFolderPath\publish"
$zipPath = "$publishFolder\app.zip"

# Login to Azure
az login

# Set subscription (optional)
# az account set --subscription "your-subscription-id"

# Deploy Azure Web App
Write-Host "Deploying Azure Web App..."
az deployment group create --resource-group $resourceGroupName --template-file main.bicep --parameters location=WestUs3 webAppName=$webAppName


 # Configure NuGet path (execute only once on a machine)
dotnet nuget add source https://api.nuget.org/v3/index.json -n nuget.org

 # Delete the existing publish folder (if it exists)
if (Test-Path $publishFolder) {
 	Remove-Item -Path $publishFolder -Recurse -Force -Confirm:$false
}
 
 
# Build the Blazor app
Write-Host "Building Blazor app..."
dotnet publish $projectFilePath -c Release -o $publishFolder -warnaserror:none --nologo
 
 # Delete the existing zip file if it exists
 if (Test-Path $zipPath) {
 	Remove-Item $zipPath -Force
 }

 	
# Archive published files
Compress-Archive -Path "$publishFolder\*" -DestinationPath $zipPath -Update

# Deploy files to Azure Web App
Write-Host "Deploying to Azure Web App..."
az webapp deploy --resource-group $resourceGroupName --name $webAppName --src-path $zipPath --type zip

Write-Host "Deployment completed successfully!"

Open https://<WebAppName>.azurewebsites.net to access the tool.
Enable the use of a single public IP for consistent firewall rules or Enable Private Endpoint if required.

Deploy on Azure using precompiled binaries (option 2)

Clone/Download the repository: https://github.com/AzureCosmosDB/MongoMigrationWebBasedUtility
Download the .zip file (excluding source code.zip and source code.tar.gz) from the latest release available at https://github.com/AzureCosmosDB/MongoMigrationWebBasedUtility/releases.
Open PowerShell.
Navigate to the cloned project folder.

Run the following commands in PowerShell:

# Variables to be updated
$resourceGroupName = <Replace with Existing Resource Group Name>
$webAppName = <Replace with Web App Name>
$zipPath = <Replace with full path of downloaded latest release zip file on local>

# Login to Azure
az login

# Set subscription (optional)
# az account set --subscription "your-subscription-id"

# Deploy Azure Web App 
Write-Host "Deploying Azure Web App..."
az deployment group create --resource-group $resourceGroupName --template-file main.bicep --parameters location=WestUs3 webAppName=$webAppName

# Deploy files to Azure Web App
Write-Host "Deploying to Azure Web App..."
az webapp deploy --resource-group $resourceGroupName --name $webAppName --src-path $zipPath --type zip

Write-Host "Deployment completed successfully!"

Open https://<WebAppName>.azurewebsites.net to access the tool.
Enable the use of a single public IP for consistent firewall rules or Enable Private Endpoint if required.

Enable App Authentication (Optional but Recommended)

The WebApp deployed using the above steps will be publicly accessible to anyone with the URL. To secure the application, it is recommended to enable App Service Authentication.

For more details, see Add app authentication to your web app.

Integrating Azure Web App with a VNet to Use a Single Public IP (Optional)

VNet Integration for MongoDB servers within a private Virtual Network (VNet)

Accessing MongoDB servers within a VNet requires VNet injection. To enable connectivity to MongoDB servers located within a private VNet, ensure that VNet integration is configured for your application.

1. Create a VNet (If Not Already Existing)

Create a Virtual Network:
- Go to Create a resource in the Azure Portal.
- Search for Virtual Network and click Create.
- Provide a Name for the VNet.
- Choose the desired Region (ensure it matches the Web App's region for integration).
- Define the Address Space (e.g., 10.0.0.0/16).
Add a Subnet:
- In the Subnet section, create a new subnet.
- Provide a Name (e.g., WebAppSubnet).
- Define the Subnet Address Range (e.g., 10.0.1.0/24).
- Set the Subnet Delegation to Microsoft.Web for VNet integration.
Create the VNet:
- Click Review + Create and then Create.

1. Enable VNet Integration for the Web App

Go to the Web App:
- Navigate to your Azure Web App in the Azure Portal.
Enable VNet Integration:
- In the left-hand menu, select Networking.
- Under Outbound Traffic, click VNet Integration.
- Click Add VNet and choose an existing VNet and subnet.
- Ensure the subnet is delegated to Microsoft.Web.
Save the configuration.

2. Configure a NAT Gateway for the VNet

Create a Public IP Address:
- Go to Create a resource in the Azure Portal.
- Search for Public IP Address and click Create.
- Assign a name and ensure it's set to Static.
- Complete the setup.
Create a NAT Gateway:
- Go to Create a resource in the Azure Portal.
- Search for NAT Gateway and click Create.
- Assign a name and link it to the Public IP Address created earlier.
- Attach the NAT Gateway to the same subnet used for the Web App.
Save the configuration.

3. Update Firewall Rules

In your firewall (e.g., Azure Firewall, third-party), allow traffic from the single public IP address used by the NAT Gateway.
Test access to MongoDB Source and Destination that have been configured with the new IP in their firewall rules.

Enable Private Endpoint on the Azure Web App (Optional)

Prerequisites

Ensure you have:

An existing Azure Virtual Network (VNet).
A subnet dedicated to private endpoints (e.g., PrivateEndpointSubnet).
Permissions to configure networking and private endpoints in Azure.

Enable Private Endpoint

1. Navigate to the Web App

Open the Azure Portal.
In the left-hand menu, select App Services.
Click the desired web app.

2. Access Networking Settings

In the web app's blade, select Networking.
Under the Private Endpoint section, click + Private Endpoint.

3. Create the Private Endpoint

Follow these steps in the Add Private Endpoint Advanced wizard:

a. Basics

Name: Enter a name for the private endpoint (e.g., WebAppPrivateEndpoint).
Region: Ensure it matches your VNet’s region.

b. Resource

Resource Type: Select Microsoft.Web/sites for the web app.
Resource: Choose your web app.
Target Sub-resource: Select sites.

c. Configuration

Virtual Network: Select your VNet.
Subnet: Choose the PrivateEndpointSubnet.
Integrate with private DNS zone: Enable this to link the private endpoint with an Azure Private DNS zone (recommended).

4. Review and Create

Click Next to review the configuration.
Click Create to finalize.

5. Verify Private Endpoint Connection

Return to the Networking tab of your web app.
Verify the private endpoint status under Private Endpoint Connections as Approved.

Configure DNS (Optional but Recommended)

If using a private DNS zone:

Validate DNS Resolution: Run the following command to ensure DNS resolves to the private IP:
```
nslookup <WebAppName>.azurewebsites.net
```
Update Custom DNS Servers (if applicable): Configure custom DNS servers to resolve the private endpoint using Azure Private DNS.

Test Connectivity

Deploy a VM in the same VNet.
From the VM, access the web app via its URL (e.g., https://<WebAppName>.azurewebsites.net).
Confirm that the web app is accessible only within the VNet.

On-Premises Deployment

Follow these steps to migrate data from an on-premises MongoDB VM. You can deploy the utility by either compiling the source files or using the precompiled binaries.

Steps to Deploy on a Windows Server

Prepare the Windows Server
- Log in to the Windows Server using Remote Desktop or a similar method.
- Ensure the server has internet access to download required components.
Install .NET 9 Runtime
- Download the .NET 9 Hosting Bundle:
- Visit the .NET Download Page.
- Download the Hosting Bundle under the Runtime section.
- Install the Hosting Bundle:
  - Run the installer and follow the instructions.
  - This will install the ASP.NET Core Runtime and configure IIS to work with .NET applications.
Install and Configure IIS (Internet Information Services)
- Open Server Manager.
- Click Add Roles and Features and follow these steps:
  - Select Role-based or feature-based installation.
  - Choose the current server.
  - Under Server Roles, select Web Server (IIS).
  - In the Role Services section, ensure the following are selected:
    - Web Server > Common HTTP Features > Static Content, Default Document.
    - Web Server > Application Development > .NET Extensibility 4.8, ASP.NET 4.8.
    - Management Tools > IIS Management Console.
  - Click Install to complete the setup.

Enable Required Windows Features by running the following PowerShell command

Enable-WindowsOptionalFeature -Online -FeatureName IIS-ASPNET45, IIS-NetFxExtensibility45

Create the App Directory
- On the server, create an empty folder to store the MongoMigrationWebApp files. This folder will be referred to as the app directory.
- Recommended path: C:\inetpub\wwwroot\MongoMigrationWeb
- Configure File Permissions
  - Right-click the app directory, select Properties > Security.
  - Ensure that the IIS_IUSRS group has Read & Execute permissions.
Configure IIS for WebApp
- Open IIS Manager (search IIS in the Start menu).
- Right-click Sites in the left-hand pane and select Add Website.
  - Site Name: Enter a name for your site (e.g., MongoMigrationWeb).
  - Physical Path: Point to the app directory.
  - Binding: Configure the site to use the desired port (e.g., 8080 for HTTPS).
Set Up Application Pool
- In IIS Manager, select Application Pools.
- Create a new Application Pool:
  - Right-click and select Add Application Pool.
  - Name it (e.g., MongoMigrationWebPool).
  - Set the .NET CLR version to No Managed Code.
- Select the site, click Basic Settings, and set the application pool to the one created.
Deploy the binaries to the app directory
- Use precompiled binaries
  - Download the .zip file (excluding source code.zip and source code.tar.gz) from the latest release available at https://github.com/AzureCosmosDB/MongoMigrationWebBasedUtility/releases.
  - Unzip the files to the app directory.
  - Ensure the web.config file is present in the root of your app directory. This file is critical for configuring the IIS hosting. or
- Use Source Files
  1. Install .NET SDK
  2. Clone/Download the repository: https://github.com/AzureCosmosDB/MongoMigrationWebBasedUtility
  3. Open PowerShell.
  4. Navigate to the cloned project folder.
  5. Run the following commands in PowerShell:
```
```powershell
    # Variables to be updated
    
    $webAppName = <Replace with Web App Name>
    $projectFolderPath = <Replace with path to cloned repo on local>
    
    
    # Paths - No changes required
    $projectFilePath = "$projectFolderPath\MongoMigrationWebApp\MongoMigrationWebApp.csproj"
    $publishFolder = "$projectFolderPath\publish"
    $zipPath = "$publishFolder\app.zip"
      
    
    # Configure Nuget Path. Execute only once on a machine
    dotnet nuget add source https://api.nuget.org/v3/index.json -n nuget.org
    
    
    # Build the Blazor app
    Write-Host "Building Blazor app..."
    dotnet publish $projectFilePath -c Release -o $publishFolder -warnaserror:none --nologo        	
    
    
    Write-Host "Published to "$publishFolder
```
```
  6. Copy the contents of the published folder to the app directory.
Restart IIS from the right-hand pane.

How to Use

Add a New Job

Go to the home page: https://<WebAppName>.azurewebsites.net and click New Job.
In the New Job Details pop-up, enter all required information.
Refer to Collections input formats to learn multiple ways to input collection list for migration. If necessary, use the list collection steps to create a comma-separated list of collection names.
Choose the migration tool: either Mongo Dump/Restore or Mongo Driver.
Select the desired migration mode.
Select Post Migration Sync Back to enable syncing back to the source after migration, this helps reduce risk by allowing a rollback to the original server if needed.
Select Append Mode to preserve existing collection(s) on the target without deleting them.
Select Skip Indexes to prevent the tool from copying indexes from the source.
Once all fields are filled, select OK.
The job will automatically start if no other jobs are running.

Note: For the Mongo Dump/Restore option, the Web App will download the mongo-tools from the URL specified in the Web App settings. Ensure that the Web App has access to this URL. If the Web App does not have internet access, you can download the mongo-tools zip file to your development machine, then copy it to the wwwroot folder inside the published folder before compressing it. Afterward, update the URL in the Web App settings to point to the Web App’s URL (e.g., https://.azurewebsites.net/<zipfilename.zip>).

Migration modes

Migrations can be done in two ways:

Offline Migration: A snapshot based bulk copy from source to target. New data added/updated/deleted on the source after the snapshot isn't copied to the target. The application downtime required depends on the time taken for the bulk copy activity to complete.
Online Migration: Apart from the bulk data copy activity done in the offline migration, a change stream monitors all additions/updates/deletes. After the bulk data copy is completed, the data in the change stream is copied to the target to ensure that all updates made during the migration process are also transferred to the target. The application downtime required is minimal.

Oplog retention size

For online jobs, ensure that the oplog retention size of the source MongoDB is large enough to store operations for at least the duration of both the download and upload activities. If the oplog retention size is too small and there is a high volume of write operations, the online migration may fail or be unable to read all documents from the change stream in time.

Get List of Collections

Run the below script in mongo shell to lists all authorized databases and collections in the format dbname.collectionname.

Set currentOnly=true to list collections from the current database. To list collections across all databases (excluding system collections and system databases), set currentOnly=false.

// Optional boolean to restrict to current DB only
const currentOnly = true;

print("-------------------------------- ");

function isSystemCollection(name) {
    return name.startsWith("system.");
}

function isSystemDatabase(name) {
    return (
        name === "admin" ||
        name === "config" ||
        name === "local" ||
        name.startsWith("system")
    );
}

const result = [];

function listCollectionsSafely(dbName) {
    if (isSystemDatabase(dbName)) {
        print(`⚠️ Skipping system database: ${dbName}`);
        return;
    }
    try {
        const currentDb = db.getSiblingDB(dbName);
        const collections = currentDb.getCollectionNames().filter(c => !isSystemCollection(c));
        collections.forEach(c => result.push(`${dbName}.${c}`));
    } catch (err) {
        print(`⚠️ Skipping ${dbName}: ${err.message}`);
    }
}

if (currentOnly) {
    // Use only the current database (skip if it’s system)
    listCollectionsSafely(db.getName());
} else {
    // Enumerate all databases
    const dbs = db.adminCommand({ listDatabases: 1 }).databases;
    dbs.forEach(d => listCollectionsSafely(d.name));
}

print(" ");
print("******OUTPUT****************");
// Print result as CSV (db.coll,db.coll,...)
print(result.join(","));
print("-------------------------------- ");

Sequencing your collections

The job processes collections in the order they are added. Since larger collections take more time to migrate, it’s best to arrange the collections in descending order of their size or document count.

View a Job

From the home page
Select the eye icon corresponding to the job you want to view.
On the Job Details page, you will see the collections to be migrated and their status in a tabular format.
Depending on the job's status, one or more of the following buttons will be visible:
- Resume Job: Visible if the migration is paused. Select this to resume the current job. The app may prompt you to provide the connection strings if the cache has expired.
- Pause Job: Visible if the current job is running. Select this to pause the current job. You can resume it later.
- Update Collections: Select this to add/remove collections in the current job. You can only update collections for a paused job. Removing a collection that is partially or fully migrated will lead to the loss of its migration details, and you will need to remigrate it from the start.
- Cut Over: Select this to cut over an online job when the source and target are completely synced. Before cutting over, ensure to stop write traffic to the source and wait for the Time Since Last Change to become zero for all collections. Once cut over is performed, there is no rollback.
The Monitor section lists the current actions being performed.
The Logs section displays system-generated logs for debugging. You can download the logs by selecting the download icon next to the header.

Note: An offline job will automatically terminate once the data is copied. However, an online job requires a manual cut over to complete.

Time Since Last Change

Time Since Last Change refers to the time difference between the timestamp of the last processed change and the current time. During an online migration, the lag will be high immediately after the upload completes, but it should decrease as change stream processing starts, eventually reaching zero. If the lag does not reduce, consider the following:

Ensure the job is not paused and is processing requests. Resume the job if necessary.
Monitor for new write operations on the source. If no new changes are detected, the lag will increase. However, this is not an issue since all changes have already been processed.
Check if the transactions per second on the source are very high; in this case, you may need a larger app service plan or a dedicated web app for the collection.

Time Since Sync Back

Time Since Sync Back indicates the time elapsed between the most recent Sync Back operation and the current time. This metric becomes relevant after the application cutover, once the target account begins receiving traffic.

Initially, you may not see updates here until the application cutover is complete. Once active, the value should update regularly. If it doesn't, consider the following:

Confirm that the target account is live and actively receiving traffic.
Ensure the Sync Back process is running and not paused.
Monitor for incoming changes on the target. If no new writes are occurring, the lag may grow, which is expected in the absence of new data.

Update Web App Settings

From the home page
Select the gear icon to open the settings page.

Remove a Job

From the home page
Select the bin icon corresponding to the job you want to remove.

Download Job Details

From the home page
Select the download icon next to the job title to download the job details as JSON. This may be used for debugging purposes.

Job options and behaviors

When creating or resuming a job, you can tailor behavior via these options:

Migration tool
- MongoDump and MongoRestore: Uses mongo-tools. Best for moving from self-hosted or Atlas to Cosmos DB. Requires access to Mongo Tools ZIP URL configured in Settings.
- MongoDB Driver: Uses the driver for bulk copy and change stream catch-up.
- MongoDB Driver (Cosmos DB RU read optimized): Optimized for Cosmos DB Mongo vCore as source. Skips index creation and uses RU-friendly read patterns. Filters on collections are not supported for this mode.
Migration mode
- Offline: Snapshot-only copy. Job completes automatically when copy finishes.
- Online: Copies bulk data, then processes change streams to catch up. Requires manual Cut Over to finish.
Append Mode
- If ON: Keeps existing target data and appends new documents. No collection drop. Good for incremental top-ups.
- If OFF: Target collections are overwritten. A confirmation checkbox is required to proceed.
Skip Indexes
- If ON: Skips index creation on target (data only). Forced ON for RU-optimized copy. You can create indexes separately before migration.
Change Stream Modes
- Delayed: Start change stream processing after all collections are completed. This mode is ideal when migrating a large number of collections.
- Immediate: Start change stream processing immediately as each collection is processed.
- Aggressive: Use aggressive change stream processing when the oplog is small or the write rate is very high. Avoid this mode if a large number of collections need to be migrated.
Post Migration Sync Back
- For online jobs, after Cut Over you can enable syncing from target back to source. This reduces rollback risk. UI shows Time Since Sync Back once active.
All collections use ObjectId for the _id field
- If ON: Automatically applies "DataTypeFor_Id": "ObjectId" to all collections in the job, enabling ObjectId-specific optimizations.
- Benefits: Faster partitioning (6x speedup), optimized chunk boundaries, leverages timestamp-based partitioning strategies.
- If OFF: Each collection's _id type is auto-detected, or you can specify DataTypeFor_Id individually via JSON format.
- See Multiple Partitioners for ObjectId for performance details.
Simulation Mode (No Writes to Target)
- Runs a dry-run where reads occur but no writes are performed on target. Useful for validation and sizing.

Collections input formats

You can specify collections in two ways:

CSV list

Example: db1.col1,db1.col2,db2.colA
Order matters. Larger collections should appear first to reduce overall time.

CSV list with wildcards

Example: db1.*,*.users,*.*
If the collection count is large consider splitting it into multiple jobs.

JSON list with optional filters

Use the CollectionInfoFormat JSON Format:

Notes:

Filters must be valid MongoDB query JSON (as a string). Only supports basic operators (eq,lt,lte,gt,gte,in) on root fields. They apply to both bulk copy and change stream.
Specify DataTypeFor_Id if the collection contains only a single data type for the _id field. Supported values are: ObjectId, Int, Int64, Decimal128, Date, BinData, String, and Object.
RU-optimized copy does not support filters or DataTypeFor_Id ; provide only DatabaseName and CollectionName.
System collections are not supported.

CollectionInfoFormat JSON Format

[
   { "CollectionName": "Customers", "DatabaseName": "SalesDB", "Filter": "{ \"status\": \"active\"}" },
   { "CollectionName": "Orders", "DatabaseName": "SalesDB", "Filter": "{ \"orderDate\": { \"$gte\": { \"$date\": \"2024-01-01T00:00:00Z\" } } }" },
   { "CollectionName": "Products", "DatabaseName": "InventoryDB", "DataTypeFor_Id": "ObjectId" }
]

How To Guide

Dynamic Scaling of MongoDump/Restore Workers

The migration tool supports runtime adjustment of parallel worker counts for both dump and restore operations, allowing you to optimize performance based on system load and resource availability.

Understanding Worker Pools

The tool uses three types of workers during MongoDump/Restore migration:

Dump Workers: Control how many mongodump processes run in parallel to extract data from the source
Restore Workers: Control how many mongorestore processes run in parallel to load data into the target
Insertion Workers: Control the --numInsertionWorkersPerCollection parameter for each mongorestore process

How to Adjust Workers During Migration

Navigate to Job Viewer: Open the job that is currently running
Locate the Worker Controls: Look for the worker adjustment section showing current worker counts
Adjust Workers:
- Dump Workers: Use the +/- buttons to increase or decrease parallel dump processes (range: 1-16)
- Restore Workers: Use the +/- buttons to increase or decrease parallel restore processes (range: 1-16)
- Insertion Workers: Adjust the number of insertion threads per restore process (range: 1-16)

How Dynamic Scaling Works

Increasing Workers (Scale Up):

When you increase dump/restore workers during active processing, new worker tasks are spawned immediately
New workers join the existing workers to process remaining chunks from the queue
You'll see log messages like: "Spawning 2 additional dump workers to process queue"
The semaphore capacity increases, allowing more concurrent operations

Decreasing Workers (Scale Down):

When you decrease worker count, the system reduces available capacity using "blocker tasks"
Active workers continue processing their current chunks
No new chunks are assigned beyond the reduced capacity
Workers complete gracefully without interruption
You'll see log messages showing the capacity change: "Dump: Decreased from 5 to 3 (-2)"

Best Practices

When to Increase Workers:

System CPU/memory utilization is low (< 60%)
Network bandwidth is underutilized
You have many small collections or chunks remaining
Database throughput on source/target can handle higher load

When to Decrease Workers:

System resources are constrained (high CPU/memory)
Database is experiencing throttling or high latency
You want to reduce impact on production systems
Other applications on the same server need resources

Optimal Settings:

Dump Workers: Start with 2-4 for most workloads; increase to 6-8 for high-throughput systems
Restore Workers: Match or slightly exceed dump workers if target can handle the load
Insertion Workers: Default is ProcessorCount/2; increase to 8-12 for large collections with high write capacity

Note: All worker pools always use the parallel infrastructure, even when set to 1. This ensures dynamic scaling works at any time during migration.

Difference Between Immediate Pause and Controlled Pause

The migration tool offers two pause mechanisms with different behaviors:

Immediate Pause (Hard Stop)

How to Trigger: Click the Pause Job button in Job Viewer

Behavior:

Cancellation token is immediately triggered
All active workers check the cancellation token and exit as soon as possible
Currently executing mongodump/mongorestore processes are killed immediately
In-progress chunks may be left incomplete
Job state is saved with partial progress

When to Use:

Emergency stop required (system issues, unexpected errors)
Need to free resources immediately
Discovered configuration errors that need correction
Critical production issue requires stopping migration

Resume Behavior:

Job resumes from the last saved state
Incomplete chunks are retried from the beginning
Change streams resume from last saved token

Controlled Pause (Graceful Stop)

How to Trigger: Click the Controlled Pause button in Job Viewer

Behavior:

Sets _controlledPauseRequested flag to true
Active workers complete their current chunks before exiting
No new chunks are dequeued from the work queue
mongodump/mongorestore processes finish naturally
All progress is saved cleanly
When dump completes during controlled pause, restore does NOT auto-start

Log Messages:

DumpRestoreProcessor: Controlled pause - no new chunks will be queued
Dump worker 1: Controlled pause active - exiting
Dump worker 2: Controlled pause active - exiting
All dump chunks completed during controlled pause
{dbName}.{colName} dump complete, but controlled pause active - restore will not start automatically

When to Use:

Planned maintenance window approaching
Want to avoid re-processing in-flight chunks
Need clean checkpoint for later resume
System resources need temporary reallocation
End of business day, want to pause cleanly overnight

Resume Behavior:

Job resumes exactly where it left off
No chunks need to be re-processed
State is consistent and complete
For collections where dump completed during pause, resume will trigger restore automatically

Comparison Table

Feature	Immediate Pause	Controlled Pause
Stop Speed	Instant	30 seconds - 2 minutes
Process Handling	Kill immediately	Complete gracefully
Chunk State	May be incomplete	Always complete
Data Safety	Safe, but some rework	Safest, no rework
Resource Cleanup	Immediate	Gradual
Resume Efficiency	Some chunks retry	Maximum efficiency
Use Case	Emergency	Planned pause

Best Practices

For Controlled Pause:

Allow 1-2 minutes for workers to complete current chunks
Monitor logs to confirm all workers have exited cleanly
Verify job status shows "Paused" before closing browser

For Immediate Pause:

Use when time is critical
Expect some chunks to be reprocessed on resume
Check for any killed process remnants if issues persist

Note: Both pause types respect the job lifecycle. On resume, the job continues from its saved state. For online migrations, change streams automatically resume from the last saved token.

Multiple Partitioners for ObjectId and Benefits of Selecting DataType for _id

Understanding Data Partitioning

When migrating large collections, the tool splits the collection into chunks based on the _id field range. This enables:

Parallel processing of different chunks simultaneously
Better memory management (smaller working sets)
Faster overall migration through concurrency
Ability to retry failed chunks without re-processing entire collection

Partitioner Types

The tool offers four partitioning strategies specifically optimized for ObjectId data types:

1. Use Time Boundaries (Recommended for most workloads)

How it Works:

Extracts the embedded timestamp from ObjectId values
Calculates equidistant time-based boundaries using BigInteger arithmetic
Partitions based on time ranges distributed evenly across the collection's time span
Creates chunks like boundaries at specific ObjectId values representing time intervals

Benefits:

Chronologically aligned: Chunks correspond to time periods
Predictable distribution: Recent data tends to be in later chunks
Efficient for time-series data: Natural ordering matches data insertion patterns
Fast boundary calculation: Uses mathematical distribution without sampling
Optimal for append-heavy workloads: Newer documents cluster together

Best For:

Collections where documents are primarily inserted chronologically
Time-series data (logs, events, transactions)
Collections with steady insert rate over time
Large collections (millions of documents) where sampling would be expensive

2. Use Adjusted Time Boundaries (Best for balanced chunks)

How it Works:

First generates time-based boundaries (same as #1 above)
Then validates actual record counts in each range
Adjusts boundaries to ensure each chunk has 1K-1M records
Merges small ranges and splits large ranges dynamically

Benefits:

Balanced chunk sizes: Each chunk has similar document count (1K-1M records)
Handles uneven distribution: Adapts when writes are clustered in time
Prevents tiny/huge chunks: Automatic validation and adjustment
Optimal parallelism: Even workload distribution across workers

Best For:

Collections with non-uniform write patterns (bursts of activity)
Variable document sizes where time ranges don't correlate with record counts
When you need predictable chunk processing times
Production workloads requiring stable performance

3. Use Pagination (Deterministic equal-sized chunks)

How it Works:

Calculates total document count and desired records per range
Uses progressive $gt filters with skip/limit to sample ObjectIds at regular intervals
Creates boundaries by paginating through the collection (e.g., every 100K records)
Avoids loading all documents into memory

Benefits:

Equal chunk sizes: Each chunk has the same number of records
Memory efficient: Progressive filtering instead of full collection scan
Deterministic: Consistent chunk boundaries based on record position
Works with any _id distribution: Doesn't depend on time properties

Best For:

Collections with random or unpredictable ObjectId patterns
When you need exact control over chunk size
Collections where time-based approaches produce imbalanced chunks
Debugging scenarios requiring predictable record counts per chunk

4. Use Sample Command (Fallback for small collections)

How it Works:

Uses MongoDB's $sample aggregation command to randomly sample ObjectIds
Sorts sampled values and creates boundaries
Quick and simple approach for smaller datasets

Benefits:

Simple implementation: Leverages MongoDB's built-in sampling
Low overhead: Single aggregation query
Works universally: No assumptions about data distribution

Best For:

Small to medium collections (< 1M documents)
Quick migrations where optimization isn't critical
Collections where other methods fail or timeout
Development/testing scenarios

Specifying DataType for _id: Why It Matters

When you specify "DataTypeFor_Id": "ObjectId" in the collection configuration, you unlock significant performance benefits:

Without DataType Specified (Generic Partitioning)

{ "CollectionName": "Orders", "DatabaseName": "SalesDB" }

Behavior:

Tool must scan collection to determine _id data type
Uses generic partitioning suitable for mixed or unknown types
Performs additional type checks during processing
Cannot leverage ObjectId-specific optimizations
Slower chunk boundary calculation

With DataType Specified (ObjectId)

{ "CollectionName": "Orders", "DatabaseName": "SalesDB", "DataTypeFor_Id": "ObjectId" }

Behavior:

Tool immediately selects the optimal partitioner for ObjectId
Skips type detection overhead (saves 1-2 queries per collection)
Enables timestamp-based partitioning by default
Optimizes query generation with ObjectId-specific operators
Faster chunk boundary calculation using ObjectId properties

Performance Impact

Collection with 10M documents:

Without DataType: ~30 seconds to analyze and partition
With DataType (ObjectId): ~5 seconds to partition
Speedup: 6x faster startup

Chunk Processing:

Without DataType: Generic queries like { "_id": { "$gte": <value>, "$lt": <value> } }
With DataType (ObjectId): Optimized queries leveraging timestamp component
Database efficiency: Better index utilization, fewer type coercion operations

Supported DataType Values

You can specify these values for DataTypeFor_Id:

DataType	Use Case	Partitioning Strategy
ObjectId	Standard MongoDB collections	Timestamp-based or random sampling
Int	Integer `_id` fields	Numeric range partitioning
Int64	Long integer `_id`	Large numeric range partitioning
Decimal128	High-precision numeric `_id`	Decimal range partitioning
Date	DateTime `_id` values	Chronological partitioning
BinData	Binary GUID/UUID `_id`	Binary range partitioning
String	String-based `_id`	Lexicographic partitioning
Object	Compound `_id` objects	Hash-based partitioning

How to Configure

Using CollectionInfoFormat JSON:

[
    {
        "CollectionName": "Users",
        "DatabaseName": "AppDB",
        "DataTypeFor_Id": "ObjectId"
    },
    {
        "CollectionName": "Sessions",
        "DatabaseName": "AppDB",
        "DataTypeFor_Id": "String"
    },
    {
        "CollectionName": "Analytics",
        "DatabaseName": "AppDB",
        "DataTypeFor_Id": "Date"
    }
]

Best Practices

Always Specify DataType When:

Collection has more than 1 million documents
You know the _id field type is consistent
Performance is critical (large-scale migrations)
Collection uses ObjectId for _id (most common case)

Let Tool Auto-Detect When:

Collection has mixed _id types (rare, but possible)
Small collections (< 100K documents) where overhead is negligible
Unsure about _id type consistency
Testing/development scenarios

Verification:

Before migration, run this query to check _id type consistency:

db.collection.aggregate([
    {
        $group: {
            _id: { $type: "$_id" },
            count: { $sum: 1 }
        }
    }
])

If result shows single type (e.g., "objectId"), specify that type for optimal performance

Advanced: Chunk Size Tuning

When using ObjectId with specified DataType, you can optimize chunk count:

For Time-Based Partitioning:

Collections with ~1M docs/month: Use monthly chunks (12 chunks/year)
Collections with ~1M docs/week: Use weekly chunks (52 chunks/year)
Collections with ~1M docs/day: Use daily chunks (365 chunks/year)

The tool automatically calculates optimal chunk count based on:

Total document count estimate
Target chunk size (configurable in Settings)
ObjectId timestamp distribution

Result: Balanced parallelism without creating too many tiny chunks or too few large chunks.

Settings (gear icon)

These settings are persisted per app instance and affect all jobs:

Mongo tools download URL
- HTTPS ZIP URL to mongo-tools used by Dump/Restore. If your app has no internet egress, upload the ZIP alongside your app content and point this URL to your app’s public URL of the file.
- Must start with https:// and end with .zip
Binary format utilized for the _id
- Use when your source uses binary GUIDs for _id.
Chunk size (MB) for mongodump
- Range: 2–5120. Affects download-and-upload batching for Dump/Restore.
Mongo driver page size
- Range: 50–40000. Controls batch size for driver-based bulk reads/writes.
Change stream max docs per batch
- Range: 100–10000. Larger batches reduce overhead but increase memory/latency.
Change stream batch duration (max/min, seconds)
- Max range: 20–3600; Min range: 10–600; Min must be less than Max. Controls how long change processors run per batch window.
Max collections per change stream batch
- Range: 1–30. Concurrency limit when processing multiple collections.
Sample size for hash comparison
- Range: 5–2000. Used by the “Run Hash Check” feature to spot-check document parity.
ObjectId Partitioner
- Choose the partitioning method for ObjectId-based collections:
  - Use Sample Command: Uses MongoDB's $sample command (best for small collections)
  - Use Time Boundaries: Time-based boundaries using ObjectId timestamps (recommended for most workloads)
  - Use Adjusted Time Boundaries: Time-based boundaries adjusted by actual record counts (best for balanced chunks)
  - Use Pagination: Pagination-based boundaries for equal-sized chunks (deterministic)
- See Multiple Partitioners for ObjectId for detailed explanations.
CA certificate file for source server (.pem)
- Paste/upload the PEM (CA chain) if your source requires a custom CA to establish TLS.

Advanced notes:

App setting AllowMongoDump (see MongoMigrationWebApp/appsettings.json) toggles whether the “MongoDump and MongoRestore” option is available in the UI.
The app’s working folder defaults to the system temp path, or to %ResourceDrive%\home\ when present (e.g., on Azure App Service). It stores job state under migrationjobs and logs under migrationlogs.

Job lifecycle controls in Job Viewer

Resume Job: Resume with updated or existing connection strings.
Pause Job: Safely pause the running job.
Cut Over: For online jobs, enabled when change stream lag reaches zero for all collections. You can choose Cut Over with or without Sync Back.
Update Collections: Add/remove collections on a paused job. Removing a collection discards its migration and change stream state.
Reset Change Stream: For selected collections, reset the checkpoint to reprocess from the beginning. Useful if you suspect missed events.
Run Hash Check: Randomly samples documents and compares hashes between source and target to detect mismatches. Controlled by the Settings sample size.

RU-optimized copy (Cosmos DB source)

Reduce RU consumption when reading from Cosmos DB for MongoDB (RU) by using change-feed-like incremental reads and partition batching. This method skips index creation during the copy; use the schema migration script at https://aka.ms/mongoruschemamigrationscript to create the indexes on the target collections. It focuses on efficient incremental ingestion. Collection filters are not supported.

Security and data handling

Connection strings are not persisted on disk (they’re excluded from persistence). Endpoints are stored for display and validation.
Provide a PEM CA certificate if your source uses a private CA for TLS.
Consider deploying behind VNet integration and/or Private Endpoint (see earlier sections) to restrict access.
Entra ID Authentication (Optional): You can configure Entra ID-based authentication for your Azure Web App to ensure only valid users can access the migration tool. This adds an additional layer of security by requiring users to authenticate with their organizational credentials. For configuration details, see Azure App Service Authentication.

Logs, backup, and recovery

Logs: Each job writes under migrationlogs. From Job Viewer you can download the current log or a backup if corruption is detected.
Job state: Persisted under migrationjobs\list.json with rotating backups every few minutes. Use the “Recover Jobs” button on Home to restore from the best available backup snapshot. This stops any running jobs and replaces the in-memory list with the recovered state.

Performance tips

Choose an appropriate App Service plan (P2v3 recommended for large or high-TPS workloads). You can dedicate a web app per large collection.
For driver copy, tune “Mongo driver page size” upward for higher throughput, but watch memory and target write capacity.
For online jobs, ensure oplog retention on the source is large enough to cover full bulk copy duration plus catch-up.
For Dump/Restore, set chunk size to balance disk IO and memory; ensure sufficient disk space on the working folder drive.

Troubleshooting

“Unable to load job details” on Home: Use “Recover Jobs” to restore the latest healthy snapshot.
Change stream lag not decreasing: Confirm the job is running, source writes exist, and consider increasing plan size or reducing concurrent collections.
RU-optimized copy stalls: Validate source is Cosmos DB Mongo vCore, ensure no partition split warnings, and verify target write capacity.

Name		Name	Last commit message	Last commit date
Latest commit History 325 Commits
MongoMigrationWebApp		MongoMigrationWebApp
OnlineMongoMigrationProcessor		OnlineMongoMigrationProcessor
tests		tests
.gitignore		.gitignore
CollectionInfoFormat.JSON		CollectionInfoFormat.JSON
LICENSE.md		LICENSE.md
OnlineMongoMigration.sln		OnlineMongoMigration.sln
README.md		README.md
main.bicep		main.bicep
publish_code.ps1		publish_code.ps1
publish_zip.ps1		publish_zip.ps1

License

AzureCosmosDB/MongoMigrationWebBasedUtility

Folders and files

Latest commit

History

Repository files navigation

Migration Tool for Azure Cosmos DB for MongoDB (vCore-Based)

Migration tool

Migration mode

Table of Contents

Key Features

Azure Deployment

Prerequisites

Deploy on Azure using Source Files (option 1)

Deploy on Azure using precompiled binaries (option 2)

Enable App Authentication (Optional but Recommended)

Integrating Azure Web App with a VNet to Use a Single Public IP (Optional)

VNet Integration for MongoDB servers within a private Virtual Network (VNet)

1. Create a VNet (If Not Already Existing)

1. Enable VNet Integration for the Web App

2. Configure a NAT Gateway for the VNet

3. Update Firewall Rules

Enable Private Endpoint on the Azure Web App (Optional)

Prerequisites

Enable Private Endpoint

1. Navigate to the Web App

2. Access Networking Settings

3. Create the Private Endpoint

a. Basics

b. Resource

c. Configuration

4. Review and Create

5. Verify Private Endpoint Connection

Configure DNS (Optional but Recommended)

Test Connectivity

On-Premises Deployment

Steps to Deploy on a Windows Server

How to Use

Add a New Job

Migration modes

Oplog retention size

Get List of Collections

Sequencing your collections

View a Job

Time Since Last Change

Time Since Sync Back

Update Web App Settings

Remove a Job

Download Job Details

Job options and behaviors

Collections input formats

CollectionInfoFormat JSON Format

How To Guide

Dynamic Scaling of MongoDump/Restore Workers

Understanding Worker Pools

How to Adjust Workers During Migration

How Dynamic Scaling Works

Best Practices

Difference Between Immediate Pause and Controlled Pause

Immediate Pause (Hard Stop)

Controlled Pause (Graceful Stop)

Comparison Table

Best Practices

Multiple Partitioners for ObjectId and Benefits of Selecting DataType for _id

Understanding Data Partitioning

Partitioner Types

1. Use Time Boundaries (Recommended for most workloads)

2. Use Adjusted Time Boundaries (Best for balanced chunks)

3. Use Pagination (Deterministic equal-sized chunks)

4. Use Sample Command (Fallback for small collections)

Specifying DataType for _id: Why It Matters

Without DataType Specified (Generic Partitioning)

With DataType Specified (ObjectId)

Performance Impact

Supported DataType Values

How to Configure

Best Practices

Advanced: Chunk Size Tuning

Settings (gear icon)

Job lifecycle controls in Job Viewer

Packages