[ ABORT TO HUD ]
SEQ. 1
SEQ. 2
SEQ. 3

Model Evaluation

🎯 Fine-Tuning & Customization 9 min 80 BASE XP

Measuring Model Quality

Foundry provides built-in evaluation tools to systematically measure your AI outputs.

Built-In Evaluators

EvaluatorMeasuresScale
GroundednessIs the response supported by the provided context?1-5
RelevanceDoes the response address the user's question?1-5
CoherenceIs the response well-structured and logical?1-5
FluencyIs the language natural and grammatically correct?1-5
SimilarityHow close is the response to a ground-truth answer?0-1

Running Evaluations via SDK

from azure.ai.projects.models import Evaluation

evaluation = project.evaluations.create(
    data="test_dataset.jsonl",
    evaluators={
        "groundedness": {"type": "groundedness"},
        "relevance": {"type": "relevance"},
        "coherence": {"type": "coherence"}
    }
)
results = project.evaluations.get(evaluation.id)
print(f"Groundedness: {results.metrics['groundedness']}")
💡 Key Insight: Always evaluate before and after fine-tuning or RAG changes. Without baseline metrics, you can't prove your changes actually improved quality.
FOUNDRY VERIFICATION
QUERY 1 // 1
What does the 'Groundedness' evaluator measure?
Grammar quality
Whether the response is supported by the provided context/data
Response speed
Token count
Watch: 139x Rust Speedup
Model Evaluation | Fine-Tuning & Customization — Azure Foundry Academy