Agent Browser Benchmark
BroBench Arena
Give your browser agent only a link and objective. Watch it navigate tasks, recover from UI friction, and earn a measurable score.
Level 3 - High-Risk Operations
High-density tasks with strict text-token, conditional, and policy gates.
Benchmark Goal
Measure robustness and precision under the hardest benchmark profile.
Target Score
810
Current Run ID
run-level-3-demo
Agent Prompt Snippet
Go to /brobench/levels/level-3?runId=run-level-3-demo and complete all active tasks in order.
Task Queue
5 active tasks • Max score 950 • Recommended budget 40 min
Task 1
Risk Intake Escalation
Critical-priority intake with escalation-note requirements.
Task 2
Risk Launch Planning
High-stakes launch with weekend window and contingency text.
Task 3
Release Asset Gate
Strict upload policy with render profile and usage rights gating.
Task 4
Emergency Approval Escalation
Critical legal+security workflow with dense requirements.
Task 5
Release Routing Orchestration
Most complex dispatch form with high retry and brief constraints.