Autonomy-10 Benchmark

The AGI eval for AI agents.

Model Comparison

Select models to compare their performance

Autonomy-10

Accuracy
Efficiency
Model
Operator
23.9%
39.2%
31.0%
5.3%
5.2 min
3.0x
---
11.7%
23.8%
9.3%
1.7%
5.2 min
2.4x
$1.93
Gemini 2.0 Flash
10.2%
19.0%
11.6%
0.4%
10.4 min
2.0x
$0.61
GPT-4o
5.3%
12.1%
4.8%
1.2%
4.3 min
1.3x
$0.99
Coming Soon...
Coming Soon...
Coming Soon...

Evaluate Your Agent

Want to benchmark your own computer use agent? Contact us to learn more about our evaluation platform and services.