The evaluation platform for computer use agents

Evaluate and improve your computer use agent in hundreds of environments and thousands of tasks designed by |.

Docs
# load
gym = client.load("OSWorld")

env = gym.make()
observation = env.reset()

# run
for _ in range(100):
    action = agent.predict(obs)
    observation = env.step(action)

# evaluate
env.evaluate()
env.close()

We 💛 Researchers.

Browser Use
UC Berkeley
MIT
Columbia University
Silverstream
Yale University

Evaluate anything.

HUD Logo
OSWorld illustration

OSWorld

Academic369 tasks
1. OpenAI CUA38.1%
2. Claude 3.7 Sonnet28%
3. UI-TARS-72B24.6%
Financial Analysis 1 illustration

Financial Analysis 1

Professional15 tasks

Coming Soon...

financial-analyst
WebArena illustration

WebArena

Academic25 tasks

Coming Soon...

webarena
Pokemon 1 illustration

Pokemon 1

Gaming10 tasks

Coming Soon...

game-agent
WebVoyager illustration

WebVoyager

Academic643 tasks

Coming Soon...

webvoyager
Autonomy-10 illustration

Autonomy-10

Private30 tasks

Coming Soon...

autonomy
GeoGuessr 1 illustration

GeoGuessr 1

Gaming50 tasks

Coming Soon...

geoguessr
HR 1 illustration

HR 1

Professional15 tasks

Coming Soon...

hr-analytics
Legal Research 1 illustration

Legal Research 1

Professional15 tasks

Coming Soon...

legal-researcher

Features

Available, always

We orchestrate 100s of concurrent machines to spin up an environment and evaluate within seconds.

20s
[ average time taken per task ]

We adapt to your agent

Equip your agent with any other tools or services you need - we'll evaluate the computer use part.

52k
[ actions performed in our gyms ]

Rich evaluations

Use our custom evaluation pipelines with state-of-the-art telemetry information and automatic judges.

10k
[ tasks analyzed and evaluated ]

Case studies

01/24/2025

Autonomy-10

Our in-house intelligence benchmark for agentic AGI.

Pricing

HUD Logo

Basic

$2/evaluation
  • ✓Access to all stock evaluations
  • ✓Full control, telemetry and evaluation
  • ✓Access to public leaderboards

Enterprise

Custom
  • ✓Bespoke evaluation set creation
  • ✓Priority access to all gyms
  • ✓Dedicated support team

Any questions?

Book a call
HUD