Lates News
The Tongyi Laboratory has launched the evaluation benchmark PawBenchv1.0, which has been open-sourced. It focuses on the scenarios of personal assistants and general intelligent agents, integrating base models and execution frameworks (Harness) into the same evaluation system. According to the introduction, PawBench is not simply a model ranking list, but rather a cross-evaluation of "model, Harness, task" together.
Latest
2 m ago

