融合专家策略与多思维链的对抗策略生成方法
DOI:
作者:
作者单位:

国防科技大学

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

国家自然科学基金资助项目(72301289)


Adversarial Strategy Generation Integrating Expert Policies and Multi-Chain-of-Thought Reasoning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献()
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对大语言模型驱动的AI决策智能体在博弈对抗场景下决策准确度与精细度提升问题,提出了一种融合专家策略与多思维链推理的对抗策略生成方法。该方法通过融合实时画面(图像)与观察信息(文本)为大语言模型提供更全面的态势信息,并在提示词中嵌入分时专家策略与多思维链推理模块,从而大幅提升智能体控制与决策精度。在《星际争霸II》“高难度等级”典型对抗场景中进行了实验验证,方法在大语言模型未经额外训练的条件下取得95%胜率。实验结果表明,方法在高动态对抗环境中能够生成可解释的精细决策,为 LLM 在强对抗场景下行动策略的生成研究提供了有力思路。

    Abstract:

    To address the challenge of enhancing both accuracy and fine-grained decision-making in adversarial settings involving AI agents powered by large language models (LLMs), this study introduces a novel strategy generation approach that integrates expert policies with multi-chain-of-thought (CoT) reasoning. By fusing real-time visual input (images) with structured observational data (text), the method provides LLMs with richer situational awareness. Time-sensitive expert strategies and parallel reasoning chains are embedded into prompt design, significantly improving the agent’s control and tactical precision. Experimental validation in high-difficulty StarCraft II scenarios demonstrates that the method achieves a 95% win rate without any additional model training. Results indicate that the approach enables interpretable and fine-tuned decision outputs in highly dynamic adversarial environments, offering a compelling pathway for leveraging LLMs in strategic behavior generation under competitive conditions.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-04-14
  • 最后修改日期:2025-09-09
  • 录用日期:2025-09-15
  • 在线发布日期:
  • 出版日期:
文章二维码