The Content Understanding Service (Generally Available since November 2025) is a multimodal ingestion engine that transforms unstructured content — documents, images, audio, and video — into structured, AI-ready data. Unlike Document Intelligence which focuses on text extraction, Content Understanding provides a unified pipeline for all content types.
| Content Type | Input Formats | Output |
|---|---|---|
| Documents | PDF, Word, PowerPoint, HTML | Structured text, tables, key-value pairs |
| Images | JPEG, PNG, TIFF, BMP | Captions, objects, text (OCR), scene descriptions |
| Audio | WAV, MP3, M4A, FLAC | Transcripts, speaker diarization, sentiment |
| Video | MP4, AVI, MOV | Scene segmentation, keyframe extraction, transcripts |
Content Understanding integrates directly with Azure AI Search for end-to-end RAG workflows. Upload a video, and the service extracts transcripts, identifies key scenes, generates searchable metadata, and pushes everything into your search index — ready for AI agents to query.