Skip to content

fix(eval): pass simulation instructions to trajectory evaluator#1740

Draft
Chibionos wants to merge 1 commit into
UiPath:mainfrom
Chibionos:fix/agent-trajectory-simulation-instructions
Draft

fix(eval): pass simulation instructions to trajectory evaluator#1740
Chibionos wants to merge 1 commit into
UiPath:mainfrom
Chibionos:fix/agent-trajectory-simulation-instructions

Conversation

@Chibionos

Copy link
Copy Markdown
Contributor

Summary

  • pass LLMMockingStrategy.prompt through AgentExecution.simulation_instructions when running evaluators
  • add a runtime regression test for simulation instruction propagation
  • add a trajectory evaluator test proving all built-in prompt placeholders are interpolated before the LLM call

Context

Slack thread: https://uipath-product.slack.com/archives/C08D98KT51U/p1781811738866539

Default Trajectory Evaluator reports missing prompt context for URT eval runs. Legacy eval-set migration stores simulationInstructions in EvaluationItem.mocking_strategy.prompt, but UiPathEvalRuntime.run_evaluator was not passing that value into AgentExecution, so {{SimulationInstructions}} could not be populated for trajectory prompts.

Validation

  • uv run pytest tests/evaluators/test_evaluator_methods.py::TestLlmJudgeTrajectoryEvaluator tests/cli/eval/test_eval_tracing_integration.py::TestEvaluatorSpanCreation
  • uv run ruff check src/uipath/eval/runtime/runtime.py tests/cli/eval/test_eval_tracing_integration.py tests/evaluators/test_evaluator_methods.py
  • uv run ruff format --check src/uipath/eval/runtime/runtime.py tests/cli/eval/test_eval_tracing_integration.py tests/evaluators/test_evaluator_methods.py

Jira

Jira creation was blocked in the local workflow because no Jira creation provider is configured in this Codex environment. The local case has a ready-to-create payload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant