Blog Archive

博客与论文

这里独立展示我们已经发布的博客、论文解读与技术文章，按日期倒序浏览，不再占用首页滚动区域。

共 12 篇内容按发布日期倒序展示

论文 arXiv'26 Plan-MCTS: Plan Exploration for Action Exploitation in Web Navigation

日期：2026/02/14 作者：Weiming Zhang, Jihong Wang, Jiamu Zhou, et al.

将网页导航搜索从原子动作空间转到语义计划空间，以提升长程网页任务中的搜索效率和稳定性

阅读文章

论文 arXiv'26 Adaptive Milestone Reward for GUI Agents

日期：2026/02/11 作者：Congmin Zheng, Xiaoyun Mo, Xinbei Ma, et al.

通过可验证、可演化的 milestone 奖励和非对称 credit assignment，缓解长程 GUI 强化学习中的时序 credit assignment 问题

阅读文章

论文 ACL'26 Industry track (Oral) ColorBrowserAgent: Complex Long-Horizon Browser Agent with Adaptive Knowledge Evolution

日期：2026/01/07 作者：Jihong Wang, Jiamu Zhou, Weiming Zhang, et al.

通过人类在环知识适配和知识对齐的渐进式摘要，提升异构网站和长程网页任务中的稳定性

阅读文章

论文 ACL'26 Findings Agent-Dice: Disentangling Knowledge Updates via Geometric Consensus for Agent Continual Learning

日期：2026/01/03 作者：Zheng Wu, Xingyu Lou, Xinbei Ma, et al.

通过几何共识过滤和曲率加权融合 task vector，缓解 agent continual learning 中的稳定性-可塑性困境

阅读文章

论文 WWW'26 ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks

日期：2025/10/14 作者：Yuanyi Song, Heyuan Huang, Qiqiang Lin, et al.

用图结构静态模拟动态手机交互，支持多正确路径、原子能力分析和复杂长程任务评测

阅读文章

论文 arXiv'25 VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents

日期：2025/09/07 作者：Zheng Wu, Heyuan Huang, Xingyu Lou, et al.

让 OS agent 在不可信场景下主动向人发问，在可信场景下自主执行，以降低过度执行风险

阅读文章

论文 arXiv'25 Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents

日期：2025/08/08 作者：Zheng Wu, Heyuan Huang, Yanjia Yang, et al.

通过显式 SOP 与隐式习惯库双通道建模，让 mobile-use agent 更贴近具体用户的真实意图

阅读文章

论文 NeurIPS'26 MobileUse: A GUI Agent with Hierarchical Reflection for Autonomous Mobile Operation

日期：2025/07/16 作者：Ning Li, Xiangmou Qu, Jiamu Zhou, et al.

通过 action、trajectory、global 三层反思和主动探索机制，提升移动长程任务中的鲁棒执行能力

阅读文章

论文 ACL'25 Findings HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios

日期：2024/12/12 作者：Jun Wang, Jiamu Zhou, Muning Wen, et al.

面向移动助手多轮对话的细粒度 function-calling benchmark

阅读文章

论文 LLM-based Multi-Agent Systems: Techniques and Business Perspectives

日期：2024/11/14 作者：Yingxuan Yang, Qiuying Peng, Jun Wang, et al.

从协议、训练、安全、隐私到流量与智能变现，讨论 LLM-based Multi-Agent System 的整体框架

阅读文章

论文 ICLR'25 (Spotlight) Hammer: Robust Function-Calling for On-Device Language Models via Function Masking

日期：2024/10/04 作者：Qiqiang Lin, Muning Wen, Qiuying Peng, et al.

基于 Function Masking 的轻量级 Function-calling 模型 - Hammer

阅读文章

论文 ACL'24 Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives

日期：2024/01/02 作者：Wenqi Zhang, Yongliang Shen, Linjuan Wu, et al.

让模型先生成多种解题视角，再对比差异并总结检查清单，以提升无外部反馈场景下的自反思稳定性

阅读文章