融合专家策略与多思维链的对抗策略生成方法

融合专家策略与多思维链的对抗策略生成方法
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:国防科技大学
作者简介:
通讯作者:
中图分类号:TP18
基金项目:国家自然科学基金资助项目(72301289)

Adversarial Strategy Generation Integrating Expert Policies and Multi-Chain-of-Thought Reasoning

Author:

Affiliation:

摘要

图/表

访问统计

参考文献

相似文献

引证文献()

资源附件

文章评论

摘要:

针对大语言模型驱动的AI决策智能体在博弈对抗场景下决策准确度与精细度提升问题，提出了一种融合专家策略与多思维链推理的对抗策略生成方法。该方法通过融合实时画面（图像）与观察信息（文本）为大语言模型提供更全面的态势信息，并在提示词中嵌入分时专家策略与多思维链推理模块，从而大幅提升智能体控制与决策精度。在《星际争霸II》“高难度等级”典型对抗场景中进行了实验验证，方法在大语言模型未经额外训练的条件下取得95%胜率。实验结果表明，方法在高动态对抗环境中能够生成可解释的精细决策，为 LLM 在强对抗场景下行动策略的生成研究提供了有力思路。

Abstract:

To address the challenge of enhancing both accuracy and fine-grained decision-making in adversarial settings involving AI agents powered by large language models (LLMs), this study introduces a novel strategy generation approach that integrates expert policies with multi-chain-of-thought (CoT) reasoning. By fusing real-time visual input (images) with structured observational data (text), the method provides LLMs with richer situational awareness. Time-sensitive expert strategies and parallel reasoning chains are embedded into prompt design, significantly improving the agent’s control and tactical precision. Experimental validation in high-difficulty StarCraft II scenarios demonstrates that the method achieves a 95% win rate without any additional model training. Results indicate that the approach enables interpretable and fine-tuned decision outputs in highly dynamic adversarial environments, offering a compelling pathway for leveraging LLMs in strategic behavior generation under competitive conditions.

参考文献

相似文献

引证文献

引用本文

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-04-14
最后修改日期:2025-09-09
录用日期:2025-09-15
在线发布日期: 2026-05-26
出版日期:

首页

期刊介绍

投稿指南

编委会

出版声明

开放获取声明

联系我们

期刊订阅

Rss

AI检索

English

引用本文

分享

文章指标

历史

文章二维码