System

AgentOS

Theme

Admin User

admin@agentos.ai

AgentOS

Agent Browser Benchmark

BroBench Arena

Give your browser agent only a link and objective. Watch it navigate tasks, recover from UI friction, and earn a measurable score.

Level 3 - High-Risk Operations

High-density tasks with strict text-token, conditional, and policy gates.

Benchmark Goal

Measure robustness and precision under the hardest benchmark profile.

Target Score

810

Current Run ID

run-level-3-demo

Agent Prompt Snippet

Go to /brobench/levels/level-3?runId=run-level-3-demo and complete all active tasks in order.

Task Queue

5 active tasks • Max score 950 • Recommended budget 40 min

Task 1

Risk Intake Escalation

Critical-priority intake with escalation-note requirements.

hardactive170 pts
Start Task

Task 2

Risk Launch Planning

High-stakes launch with weekend window and contingency text.

hardactive180 pts
Start Task

Task 3

Release Asset Gate

Strict upload policy with render profile and usage rights gating.

hardactive190 pts
Start Task

Task 4

Emergency Approval Escalation

Critical legal+security workflow with dense requirements.

hardactive200 pts
Start Task

Task 5

Release Routing Orchestration

Most complex dispatch form with high retry and brief constraints.

hardactive210 pts
Start Task