Migrating Azure Sentinel off MMA: a field guide for production environments
Microsoft retired the Log Analytics agent (MMA) in August 2024 and Azure Monitor Agent (AMA) is the only supported collector path forward. If you have custom Sentinel connectors still bound to the HTTP Data Collector API or the legacy MMA pipeline, you have already started seeing ingestion warnings and double-billed records.
This is the playbook we used on a recent fourteen-connector migration. Zero ingestion gaps during cutover, full audit trail, operating cost down ~52%.
What actually changes under the hood
The Data Collector API let you POST anything to a Log Analytics workspace and it would magic up a custom table. AMA + Logs Ingestion API requires you to define the table shape in a Data Collection Rule (DCR), route ingestion through a Data Collection Endpoint (DCE), and the DCR validates every record against the schema before insertion. That schema enforcement is the silent footgun. Most legacy connectors send fields that drift over time, and the new pipeline will drop records that don't match instead of accepting whatever shape arrives.
The pre-cutover audit
Before touching any connector, dump the last 30 days of ingestion shape for every custom table:
CustomTable_CL
| project pack=pack_array(*)
| mv-apply p = pack on (
project Field=tostring(p["Key"]), Type=tostring(gettype(p["Value"]))
)
| summarize TypeSet=make_set(Type), Count=count() by Field
| sort by Count desc
Look for fields that show up with TypeSet=[string, long] or [string, dynamic]. Those are the records that will break in AMA's schema-enforced pipeline. Fix the producer to emit a single consistent type, or coerce in the DCR transform.
Building the DCR + DCE
The cleanest pattern we landed on:
- One DCE per region per environment. Don't share DCEs across prod and non-prod. Cross-environment writes are the source of half the security incidents we have seen.
- One DCR per source application. Don't reuse DCRs across applications. The DCR is the schema contract.
- A
transformKqlon the DCR that handles legacy field renames and timestamp normalization.
Sample DCR snippet for a generic webhook collector:
resource dcr 'Microsoft.Insights/dataCollectionRules@2023-03-11' = {
name: 'dcr-webhook-${app}'
location: location
properties: {
dataCollectionEndpointId: dce.id
streamDeclarations: {
'Custom-${app}_CL': {
columns: [
{ name: 'TimeGenerated', type: 'datetime' }
{ name: 'EventId', type: 'string' }
{ name: 'EventType', type: 'string' }
{ name: 'Payload', type: 'dynamic' }
]
}
}
dataSources: {}
destinations: {
logAnalytics: [
{ name: 'la', workspaceResourceId: workspace.id }
]
}
dataFlows: [
{
streams: ['Custom-${app}_CL']
destinations: ['la']
transformKql: 'source | extend TimeGenerated = coalesce(TimeGenerated, now())'
outputStream: 'Custom-${app}_CL'
}
]
}
}
Cutover without ingestion gaps
The naive approach is "stop MMA, start AMA, swap connector URLs." That gives you a real gap. The pattern that works:
- Stand up the AMA + DCR + DCE alongside MMA. Let both ingest in parallel for 72 hours.
- Run a KQL comparison every hour. Same count plus or minus 0.5%, same field distributions, same query results. If yes, proceed. If no, fix the DCR transform and run another 72 hours.
- Cut producer traffic over with a feature flag, one connector at a time. Watch ingestion latency on the new pipeline for an hour before moving the next.
- After all producers cut over, keep MMA running but with the workspace destination removed. This gives you a one-week rollback window without paying double-ingest costs.
- Decommission MMA only after the rollback window closes.
Where the cost savings come from
The 52% operating-cost reduction was not magic. Three drivers:
- AMA does not double-ingest. The legacy path was writing to both the workspace and the deprecated
EventForwardingtable for some connectors. That alone was ~20% of the bill. - DCR transforms let us drop noise fields before ingestion. Dropping
RawRequestBodyfrom auth logs cut ingestion volume by 30% on that table. - Logs Ingestion API has cheaper per-GB cost than the Data Collector API when you batch 50+ records per request.
The thing that bit us
The DCR validates TimeGenerated against a 48-hour window (24h past, 24h future). If your producer emits historical data on cutover (e.g., backfilling a missed window), the DCR silently drops every record older than 24h with no error returned to the producer. The transform extend TimeGenerated = now() in the snippet above is not optional. It is the difference between a clean migration and a four-hour debugging session.
Next steps
If you are mid-migration and want a second set of eyes on the DCR design or the cutover plan, we do fixed-fee Sentinel architecture reviews. Most engagements close in two weeks with a written runbook and a tested rollback plan. Get in touch via the contact page.
