Benchmarks

These benchmarks are based on real-world usage by engineers with Claude Code as the coding agent. Model names are hidden from the users during evaluation.