Foundry provides built-in evaluation tools to systematically measure your AI outputs.
| Evaluator | Measures | Scale |
|---|---|---|
| Groundedness | Is the response supported by the provided context? | 1-5 |
| Relevance | Does the response address the user's question? | 1-5 |
| Coherence | Is the response well-structured and logical? | 1-5 |
| Fluency | Is the language natural and grammatically correct? | 1-5 |
| Similarity | How close is the response to a ground-truth answer? | 0-1 |
from azure.ai.projects.models import Evaluation
evaluation = project.evaluations.create(
data="test_dataset.jsonl",
evaluators={
"groundedness": {"type": "groundedness"},
"relevance": {"type": "relevance"},
"coherence": {"type": "coherence"}
}
)
results = project.evaluations.get(evaluation.id)
print(f"Groundedness: {results.metrics['groundedness']}")