动态推理决策链:增强大语言模型对抗博弈决策能力
DOI:
作者:
作者单位:

1.国防科技大学系统工程学院大数据决策实验室;2.南开大学人工智能学院

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

国家自然科学基金资助项目(72301289)


Dynamic-Chain-of-Reasoning-and-Decision:Enhancing Decision-Making Capabilities of Large Language Models in Adversarial Games
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献()
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为解决大语言模型在复杂对抗博弈场景中推理决策能力不足的问题,基于大语言模型的文本生成机制和推理局限性,提出了动态推理决策链方法,由推理决策框架和动态决策选项库组成,作为结构化提示词工程增强大语言模型的推理和决策能力,结合任务目标约束输出格式和内容范围,以减少模型幻觉现象并提高决策准确性。对比了基准方法、传统思维链、草稿链和动态推理决策链四种方法在经典对抗博弈实验平台《星际争霸Ⅱ》中的表现。实验结果表明,动态推理决策链方法显著提高了大语言模型智能体在对抗任务中的胜率,同时降低了模型的计算资源消耗和时间消耗,提高了决策准确性和任务对齐性,为大语言模型在对抗博弈任务中的应用提供了新的理论和方法参考。

    Abstract:

    To address the limitations of LLMs (large language models) in reasoning and decision-making within complex dynamic game scenarios, the DCoRD (dynamic-chain-of-reasoning-and-decision) method was proposed based on the limitations of text generation mechanism and reasoning of large language models. The DCoRD consisted of a reasoning-decision framework and a dynamic decision option library, serving as structured prompt engineering to enhance the reasoning and decision-making abilities of LLMs. By incorporating task objectives to constrain output formats and content scope, the method reduced model hallucinations and improves decision accuracy. Four approaches were compared: free-generation mode, traditional chain-of-thought, chain-of-draft and the proposed DCoRD method, in a StarCraft II environment. Experimental results demonstrate that DCoRD significantly reduces token consumption and response latency while enhancing decision accuracy and task alignment, offering novel theoretical and methodological insights for applying LLMs to game-theoretic decision tasks.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-04-14
  • 最后修改日期:2025-09-09
  • 录用日期:2025-09-15
  • 在线发布日期:
  • 出版日期:
文章二维码