Verified catalog · test method
Tool-calling correctness
Does the agent call the right tool with the right arguments, and ask for clarification instead of guessing when a request is ambiguous?
Does the agent call the right tool with the right arguments, and ask for clarification instead of guessing when a request is ambiguous?