Verified catalog · test method

Tool-calling correctness

Does the agent call the right tool with the right arguments, and ask for clarification instead of guessing when a request is ambiguous?