Tool Usage Leaderboard
Tool call failure rates and user tool rejection rates. Lower is better.
These benchmarks are based on real-world usage by engineers with Claude Code as the coding agent. Model names are hidden from the users during evaluation.
Tool call failure rates and user tool rejection rates. Lower is better.
These benchmarks are based on real-world usage by engineers with Claude Code as the coding agent. Model names are hidden from the users during evaluation.