Features Pricing Blog Autonomy-10 SDK Twitter Get in Touch

Autonomy-10 Benchmark

The AGI eval for AI agents.

Read more about our methodology and research

Model Comparison

Select models to compare their performance

Autonomy-10


	Accuracy				Efficiency
Model
Operator	23.9%	39.2%	31.0%	5.3%	5.2 min	3.0x	---
Claude Computer Use	11.7%	23.8%	9.3%	1.7%	5.2 min	2.4x	$1.93
Gemini 2.0 Flash	10.2%	19.0%	11.6%	0.4%	10.4 min	2.0x	$0.61
GPT-4o	5.3%	12.1%	4.8%	1.2%	4.3 min	1.3x	$0.99
UI-TARS	Coming Soon...
Browser Use	Coming Soon...
Project Mariner	Coming Soon...

Evaluate Your Agent

Want to benchmark your own computer use agent? Contact us to learn more about our evaluation platform and services.