CRAB: Cross-environment Agent Benchmark

SKU: crab

CRAB (Cross-environment Agent Benchmark) is an open-source framework developed by CAMEL-AI for constructing and evaluating environments designed for large language model (LLM) agents. It supports the creation of cross-platform environments, enabling deployment across in-memory systems, Docker-hosted environments, virtual machines, or distributed physical machines. CRAB introduces a graph-based fine-grained evaluation method and an efficient mechanism for task and evaluator construction, facilitating comprehensive assessment of agent performance across diverse settings.

Developing and benchmarking LLM agents across multiple environments.
Evaluating agent performance with fine-grained, graph-based metrics.
Constructing tasks and evaluators efficiently for comprehensive agent assessment.
Facilitating cross-platform deployment of AI agents in diverse settings.
Advancing research in multimodal language model agents and their applications.
CRAB enables high-level autonomous operation through its multi-agent architecture supporting simultaneous device control and task decomposition via graph evaluators. While requiring initial task setup by humans (autonomy limitation), its demonstrated 38% completion rate for GPT-4o on novel cross-platform workflows shows substantial independence in environment navigation compared to single-device benchmarks.
Open Source
Contact