计算轨道追逃闭环均衡的有限差分方法

杨傅云翔，杨乐平，柴华; YANG Fuyunxiang; YANG Leping; CHAI Hua

计算轨道追逃闭环均衡的有限差分方法

doi: 10.11887/j.issn.1001-2486.24100008

杨傅云翔¹ ，杨乐平² ，柴华¹

1. 航天工程大学, 北京 101416

2. 国防科技大学空天科学学院, 湖南长沙 410073

基金项目: 军事科技领域青年人才托举工程资助项目(2020-JCJQ-QT-024)

详细信息

作者简介

杨傅云翔（1996—），男，江苏镇江人，讲师，博士，E-mail: yangfuyunxiang@alumni.nudt.edu.cn

通讯作者

柴华（1988—），男，山西沁水人，副教授，博士，硕士生导师，E-mail: chaihua@nudt.edu.cn

中图分类号: V448

文献标识码: A

文章编号: 1001-2486(2026)01-247-15

A finite difference method for calculating the closed-loop equilibrium of orbital pursuit-evasion game

YANG Fuyunxiang¹ ， YANG Leping² ， CHAI Hua¹

1. Space Engineering University, Beijing 101416 , China

2. College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073 , China

摘要

针对近距离轨道追逃闭环均衡构造问题，提出一种综合运用Bellman最优性原理、有限差分法和插值技术的计算方法。推导视线坐标系下的博弈系统降维动力学模型，建立近距离轨道追逃博弈模型，降低系统状态空间维度；基于Bellman最优性原理，重构原问题为哈密顿-雅可比-艾萨克偏微分方程终值问题，通过逆向分析实现同时处理多组博弈场景；利用Cartesian网格离散状态空间，使用有限差分法计算均衡受动力学驱动的动态演化过程，分析博弈态势；基于控制与均衡空间梯度的关系，使用数值插值构造闭环控制函数；通过数值仿真验证了方法的有效性。

关键词

轨道追逃 / Bellman最优性原理 / 有限差分法 / 微分对策

Abstract

The issue of constructing the closed-loop equilibrium for close-range orbital pursuit-evasion games was addressed and a computation method that integrates Bellman′s principle of optimality, the finite difference method, and interpolation techniques was proposed. A dimension-reduction dynamics of the game system in the line-of-sight coordinate frame was derived, establishing a close-range orbital pursuit-evasion game model and reducing the dimensionality of the system′s state space. Based on Bellman′s principle of optimality, the original problem was reformulated as a Hamilton-Jacobi-Isaacs partial differential equation terminal value problem, enabling the simultaneous handling of multiple game scenarios through reverse-time analysis. The state space was discretized using Cartesian grids, and the finite difference method was employed to calculate the dynamic evolution process of the equilibrium driven by the dynamics, and analyze the game situation. Utilizing the relationship between control and the spatial gradient of the equilibrium, numerical interpolation was applied to construct the closed-loop control function. The effectiveness of the proposed method was demonstrated through numerical simulations.

Keywords

orbital pursuit-evasion game / Bellman′s principle of optimality / finite difference method / differential game

1 问题建模 1.1 视线运动方程 1.2 无量纲降维动力学模型 1.3 微分对策 2 闭环均衡求解方法 2.1 问题重构 2.2 PDE求解 2.2.1 网格生成 2.2.2 空间离散 2.2.3 时间离散 2.3 控制与轨迹生成 3 仿真算例 3.1 场景通用参数设置 3.2 推力构型Ⅰ：各方向机动能力有限 3.2.1 问题重构 3.2.2 支付泛函演化结果 3.2.3 博弈场景计算结果 3.3 推力构型Ⅱ：总机动能力有限 3.3.1 问题重构 3.3.2 支付泛函演化结果 3.3.3 博弈场景计算结果 3.4 博弈规则总结 4 结论

近年来，轨道追逃问题逐渐成为航天动力学与控制领域的研究热点，受到越来越多学者的关注。该问题实质上是一个包含双边控制的连续动态博弈问题，其中被称为追踪方（pursuer，P）与逃逸方（evader，E）的局中人具有相互冲突的目标^[1]。整个博弈过程通过零和微分对策问题进行描述，通过求解均衡，构造控制函数作为策略，规划局中人最优轨迹^[2]。

均衡是非合作博弈理论的重要概念。研究动态博弈时，通常会考察开环和闭环两类均衡，其中：开环均衡是指局中人策略仅依赖于时间的博弈均衡；闭环均衡是指局中人策略依赖于时间和当前观测状态量的博弈均衡^[3]。目前轨道追逃领域主要关注开环均衡的构造方法，发展出了两条技术路线：一条是离散并参数化决策量或状态量，将问题转化为一个参数优化问题进行求解^[4]。例如，Luke^[5]轮流固定一方策略求解另一方策略，循环迭代直至双方策略收敛；Zhang等^[6]针对椭圆参考轨道附近的自由时域问题，设计了一种基于深度学习的直接法，将训练后网络的输出作为启动伪谱法的初值。该方法虽然收敛域较广且无须切换结构先验知识，但解的精度不高，且最优性无法保证。另一条是基于双边极值原理将原始问题转化成两点边值问题（two point boundary value problem，TPBVP）进行求解。因为解的精度高，且满足一阶最优性必要条件等特点，该技术路线是目前求解轨道追逃均衡最常用的方法，产生了丰富的研究成果^[7-11]。同时，也有学者针对该技术路线收敛域小、初值敏感等缺陷开发了改善方法。Hafer^[12]针对远距离追逃场景提出了一种敏感性方法，计算状态和约束条件敏感性矩阵，基于同伦法实现从无重力问题延拓至真实问题，提高了计算效率。Li等^[13]使用差分进化为Newton法提供初值猜测，提升了方法稳定性。

随着研究的深入，构造开环均衡的缺陷逐渐显露，主要表现在以下三点：①由于开环均衡特性，局中人双方并不能根据实际博弈进程动态调整自己的控制策略；②最优策略生成依赖均衡轨迹，当博弈态势偏离均衡轨迹时，需要重新计算；③受初始博弈态势约束，无法同时处理多组场景。

针对上述问题，部分学者开始关注闭环均衡。虽然针对线性二次型之类特殊问题构建了相对完整的理论方法与处理流程^[14-15]，但总体上说，闭环均衡求解方法的研究并不充分。当前的主流思路是通过计算大量开环均衡近似闭环效果。Stupik^[16]针对近距离共面追逃场景，配合Kriging插值以及粒子群优化算法，为均衡求解快速提供初值。Yang等^[17]基于深度神经网络（deep neural network，DNN）模型对TPBVP关键参数进行重参数化，将问题转化为神经网络训练过程，借助DNN模型的记忆能力与效率优势实现闭环均衡近似。张乘铭等^[18]针对交会型轨道追逃问题，提出了一种基于神经网络的滚动时域优化方法求解博弈对策，通过将神经网络猜测轨迹与伪谱法结合实现策略的快速更新。但该思路存在以下两点问题：一方面，依然受制于博弈场景初始态势，一次只能求解一场博弈，难以总结共性规律；另一方面，单一场景计算成本较高，需要大量计算开环均衡保证近似效果。

针对上述问题，本论文综合运用Bellman最优性原理、有限差分法以及样条插值技术，提出了一种构造闭环均衡、计算控制函数的方法，实现同时处理多组博弈场景，为分析博弈态势、总结共性规律提供支撑。首先推导视线坐标系下的相对运动方程，降低状态空间维度，明确支付泛函，建立轨道追逃博弈模型；其次，基于Bellman最优性原理，将问题重构为哈密顿-雅可比-艾萨克（Hamilton-Jacobi-Isaacs，HJI）偏微分方程（partial differential equation，PDE）终值问题，通过逆向分析消除场景初始态势可变的影响；然后利用Cartesian网格离散计算区域，基于有限差分法使用加权本质无振荡-总变差不增（weighted essentially non-oscillatory-total-variation-diminishing，WENO-TVD）求解器计算均衡受动力学驱动的演化过程，分析博弈态势；再基于控制函数与均衡空间梯度之间的关系，为多组场景同时设计闭环控制函数，总结共性规律；最后通过数值仿真验证了方法的有效性。

1 问题建模

1.1 视线运动方程

惯性空间中，从P质心指向E质心的单位矢量称为视线。P与E在地心惯性（earth centrical inertial，ECI）坐标系中的相对位置关系如图1所示。

图1博弈双方相对位置关系

Fig.1Relative position relationship between game players

其中：

r = r_{E} - r_{P} = r e_{r}

(1)

式中，r表示P与E之间的相对距离，e_r为视线方向单位矢量。

定义视线的瞬时旋转角速度：

ω_{s} = ω_{s} e_{ω}

(2)

式中，ω_s表示视线瞬时转率，e_ω为瞬时角速度方向单位矢量。以P质心为原点，e_r、e_ω以及e_θ

≜

e_ω×e_r三组单位矢量组成视线旋转坐标系。

定义e_ω的旋转角速度为：

Ω_{s} = Ω_{s} e_{r}

(3)

归纳（e_r，e_ω，e_θ）的变化规律，即视线运动方程如下：

\frac{d e_{r}}{d t} = ω_{s} e_{θ}

(4)

\frac{d e_{θ}}{d t} = - ω_{s} e_{r} + Ω_{s} e_{ω}

(5)

\frac{d e_{ω}}{d t} = - Ω_{s} e_{θ}

(6)

1.2 无量纲降维动力学模型

对式（1）求导后代入式（4），整理求导后，再将式（4）与式（5）代入，可得：

a_{E} - a_{P} = (\ddot{r} - r ω_{s}^{2}) e_{r} + (r {\dot{ω}}_{s} + 2 \dot{r} ω_{s}) e_{θ} + r ω_{s} Ω_{s} e_{ω}

(7)

其中加速度a_E、a_P包含引力项与推力项两部分。

考虑到近距离追逃博弈

\{\begin{matrix} r ≪ {‖r_{E}‖}_{2} \\ r ≪ {‖r_{P}‖}_{2} \end{matrix}

(8)

的特点，忽略引力项差值的影响，定义

v_{r} ≜ \dot{r} ， v_{θ} ≜ r ω_{s}

，整理式（7）得：

u_{E} - u_{P} = ({\dot{v}}_{r} - \frac{v_{θ}^{2}}{r}) e_{r} + ({\dot{v}}_{θ} + \frac{v_{r} v_{θ}}{r}) e_{θ} + v_{θ} Ω_{s} e_{ω}

(9)

其中u_E、u_P为E与P加速度中的推力项，即博弈双方的控制量。

由式（9）可知：加速度在e_ω轴的分量只改变速度量Ω_s与v_θ，不影响相对距离r。考虑到追逃问题通常关注相对距离变化，目标集约束只包含终端相对距离，追逃动力学模型可以降维简化如下：

\{\begin{array}{l} \dot{r} = v_{r} \\ {\dot{v}}_{r} - \frac{v_{θ}^{2}}{r} = u_{E}^{r} - u_{P}^{r} \\ {\dot{v}}_{θ} + \frac{v_{r} v_{θ}}{r} = u_{E}^{θ} - u_{P}^{θ} \end{array}

(10)

引入参考距离r_ref、参考速度v_ref进行去量纲化处理，可以得到无量纲降维动力学模型：

\{\begin{array}{l} \frac{d x_{r}}{d \hat{t}} = x_{ω} \\ \frac{d x_{ω}}{d \hat{t}} = \frac{x_{σ}^{2}}{x_{r}} + \frac{r_{ref}}{v_{ref}^{2}} (u_{E}^{r} - u_{P}^{r}) \\ \frac{d x_{σ}}{d \hat{t}} = - \frac{x_{r} x_{σ}}{x_{r}} + \frac{r_{ref}}{v_{ref}^{2}} (u_{E}^{θ} - u_{P}^{θ}) \end{array}

(11)

其中：

\{\begin{matrix} x_{r} ≜ \frac{r}{r_{r e f}} \\ x_{ω} ≜ \frac{v_{r}}{v_{r e f}} \\ x_{σ} ≜ \frac{v_{θ}}{v_{r e f}} \\ \hat{t} ≜ \frac{v_{r e f}}{r_{r e f}} t \end{matrix}

(12)

1.3 微分对策

基于微分对策理论描述轨道追逃博弈问题通常分为动力学模型、目标集和支付泛函三部分^[2]。动力学模型一般使用常微分方程描述如下：

\{\begin{array}{l} \dot{x} = f (x, u_{E}, u_{P}, t) \\ x (t_{0}) = x_{0} \end{array}

(13)

式中，

x \in R^{n}

表示博弈系统状态量，

u_{E} \in U_{E} \subset R^{n_{E}}

与

u_{P} \in U_{P} \subset R^{n_{P}}

分别表示E与P的控制函数。函数

f : R^{n} \times U_{E} \times U_{P} \times [t_{0} ， t_{f}] \to R^{n}

有界且Lipschitz连续。

目标集G₀表示终端约束：

G_{0} = \{x \in R^{n} ∣ ψ (x (t_{f}), t_{f}) = 0_{n^{ψ} \times 1}\}

(14)

其中

G_{0} \subseteq R^{n}

一般为闭集，可以记为一个有界且Lipschitz连续的函数g:

R^{n} \to R

的零水平子集，即：

G_{0} = \{x ∣ x \in R^{n}, g (x (t_{f})) ⩽ 0\}

(15)

支付泛函

J (u_{E}, u_{P}) = Ψ (x (t_{f}), t_{f}) + \int_{t_{0}}^{t_{f}} L (x, u_{E}, u_{P}, t) d t

(16)

分为终端支付Ψ（x（t_f），t_f）和过程支付L（x，u_E，u_P，t）两部分，前者表示终端状态对博弈结局的影响，后者表示整个博弈流程产生的消耗。

2 闭环均衡求解方法

为了消除博弈初始态势对均衡构造的影响，实现批量处理博弈场景，发展基于非线性动力学的轨道追逃求解方法，本文通过逆向分析，综合运用Bellman最优性原理、有限差分法和插值技术进行均衡构造与策略设计。该方法主要分为问题重构、PDE求解、控制与轨迹生成三部分，方法整体流程如图2所示，其中箭头表示数据流向，箭头上方标签表示数据名称，方框表示执行操作。

首先基于博弈问题的动力学模型f、目标集G₀以及支付泛函J（u_E，u_P）进行重构，获得描述支付泛函动态演化的HJI PDE；其次选择状态空间子集作为计算区域Λ并划分网格Λ_grid；然后使用WENO-TVD求解器，配合终端支付，在Λ_grid节点上计算HJI PDE终值问题数值解

\hat{J} （ t ， x ）

用于分析演化规律，同步记录数值梯度

\nabla_{x} \hat{J} （ t ， x ）

用于生成插值函数

\nabla_{x} {\hat{J}}_{1}

；在完成上述准备工作之后，将代求博弈问题初始态势组成初值序列，与

f ， \nabla_{x} {\hat{J}}_{1}

一起输入RK（Runge-Kutta）积分器求解轨迹与控制序列。接下来对方法各部分进行详细介绍。

图2方法流程

Fig.2Method process

2.1 问题重构

设u_E∈U_E与u_P∈U_P表示局中人在[t，t_f]上的各自控制，则均衡可以用式（17）表示。根据Bellman最优性原理，最优策略（

u_{E}^{*} ， u_{P}^{*}

）的求解过程可以分为两步：先选择区间[t+Δt，t_f]上的最优控制函数，再选择区间[t，t+Δt）上的最优控制函数。这样式（17）可以写为如式（18）所示形式。

\begin{matrix} J^{*} (x (t), t) = \underset{u_{E} \in U_{E}}{m a x} \underset{u_{P} \in U_{P}}{m i n} [Ψ (x (t_{f}), t_{f}) + \\ \int_{t}^{t_{f}} L (x, u_{E}, u_{P}, t) d t] \end{matrix}

(17)

\begin{matrix} J^{*} (x (t), t) \\ = \underset{u_{E} \in U_{E} [t, t + Δ t)}{m a x} \underset{u_{P} \in U_{P} [t, t + Δ t]}{m i n} \{\underset{u_{E} \in U_{E} [t + Δ t, t_{f}]}{m a x} \underset{u_{P} \in U_{P} [t + Δ t, t_{f}]}{m i n} [\int_{t}^{t + Δ t} L (x, u_{E}, u_{P}, t) d τ + \int_{t + Δ t}^{t_{f}} L (x, u_{E}, u_{P}, t) d τ + Ψ (x (t_{f}), t_{f})]\} \end{matrix}

(18)

式中

\int_{t}^{t + Δ t} L (x ， u_{E} ， u_{P} ， t) d τ

与区间[t+Δt，t_f]上的控制无关，并且

\begin{matrix} J^{*} (x (t + Δ t), t + Δ t) \\ ≜ \underset{u_{E} \in U_{E} [t + Δ t, t_{f}]}{m a x} \underset{u_{P} \in U_{P} [t + Δ t, t_{f}]}{m i n} [Ψ (x (t_{f}), t_{f}) + \\ \int_{t + Δ t}^{t_{f}} L (x, u_{E}, u_{P}, t) d τ] \end{matrix}

(19)

则式（18）可以改写为：

\begin{matrix} J^{*} (x (t), t) \\ = \underset{u_{E} \in U_{E} [t, t + Δ t)}{m a x} \underset{u_{P} \in U_{P} [t, t + Δ t)}{m i n} [\int_{t}^{t + Δ t} L (x, u_{E}, u_{P}, t) d τ] + \\ J^{*} (x (t + Δ t), t + Δ t) \end{matrix}

(20)

根据积分中值定理，式（20）的第一项可以改写成：

\begin{matrix} \int_{t}^{t + Δ t} L (x, u_{E}, u_{P}, t) d τ = L (x (t + α Δ t), u_{E} (t + α Δ t) \\ u_{P} (t + α Δ t), t + α Δ t) Δ t \end{matrix}

(21)

式中，α∈（0，1）。基于连续可微的假设，式（20）的第二项可以展开成如下Taylor级数：

\begin{matrix} J^{*} (x (t + Δ t), t + Δ t) \\ = J^{*} (x (t), t) + {[\frac{\partial J^{*} (x (t), t)}{\partial x (t)}]}^{T} \frac{d x (t)}{d t} Δ t + \\ \frac{\partial J^{*} (x (t), t)}{\partial t} Δ t + o [(Δ t)^{2}] \end{matrix}

(22)

式中，o[（Δ_t）²]表示Δt的高阶小量。将式（21）与式（22）代入式（18）可知：

\begin{matrix} - \frac{\partial J^{*} (x (t), t)}{\partial t} \\ = \underset{u_{E} \in U_{E} [t, t + Δ t)}{m a x} \underset{u_{P} \in U_{P} [t, t + Δ t)}{m i n} \{{[\frac{\partial J^{*} (x (t), t)}{\partial x (t)}]}^{T} f (x, u_{E}, u_{P}) + \\ L (x (t + α Δ t), u_{E} (t + α Δ t), u_{P} (t + α Δ t), t + α Δ t) + \\ \frac{o [(Δ t)^{2}]}{Δ t}\} \end{matrix}

(23)

上式令Δt→0可得：

\begin{matrix} - \frac{\partial J^{*} (x, t)}{\partial t} = \underset{u_{E} \in U_{E}}{m a x} \underset{P \in U_{P}}{m i n} \{L (x, u_{E}, u_{P}, t) + \\ {[\frac{\partial J^{*} (x, t)}{\partial x}]}^{T} f (x, u_{E}, u_{P}, t)\} \end{matrix}

(24)

式（24）称为HJI PDE，属于泛函与PDE的一种混合形式。配合约束条件

J^{*} (x (t_{f}), t_{f}) = Ψ (x (t_{f}), t_{f})

(25)

原问题便转化为偏微分方程终值问题。

通过定义Hamilton函数

\begin{matrix} H (x, u_{E}, u_{P}, t, \frac{\partial J^{*}}{\partial x}) \\ ≜ L (x, u_{E}, u_{P}, t) + {[\frac{\partial J^{*} (x, t)}{\partial x}]}^{T} f (x, u_{E}, u_{P}, t) \end{matrix}

(26)

并根据极值原理推导最优闭环控制函数的隐式解：

\{\begin{matrix} u_{E}^{*} = u_{E}^{*} (x, \frac{\partial J^{*}}{\partial x}) \\ u_{P}^{*} = u_{P}^{*} (x, \frac{\partial J^{*}}{\partial x}) \end{matrix}

(27)

原方程可以转化为：

\{\begin{matrix} \frac{\partial J^{*} (x, t)}{\partial t} + H (x, \frac{\partial J^{*} (x, t)}{\partial x}) = 0 \\ J^{*} (x (t_{f}), t_{f}) = Ψ (x (t_{f}), t_{f}) \end{matrix}

(28)

后续方程计算过程中，省略均衡J^*（x（t），t）的上标，简记为J。

2.2 PDE求解

2.2.1 网格生成

在综合考虑动力学模型的适用范围以及计算消耗之后，需要将空间计算域限制在一个有限区域Λ内，并在此区域内生成均匀Cartesian网格^[19]，对均衡J进行离散：

J = J_{d_{1}, d_{2}, \dots, d_{n}}^{k}

(29)

式中：下标d₁，d₂，···，d_n分别表示网格节点（x₁，x₂，···，x_n）的坐标；上标k表示时间步。

在设置各空间维度的步长Δx₁，Δx₂，···，Δx_n以及时间步长Δt时，为了保证计算稳定性，需要满足柯朗-弗里德里希斯-列维（Courant-Friedreichs-Lewy，CFL）条件^[20]:

Δ t m a x (\sum_{i = 1}^{n} \frac{|\partial H / \partial J_{x_{i}}|}{Δ x_{i}}) = α_{C F L}

(30)

式中，J_xi表示J对x_i的偏导数，α_CFL∈（0，1）表示CFL数。

2.2.2 空间离散

网格表示允许以一种直接的方式计算均衡J的空间梯度

\nabla_{x} J

。这里使用了五阶WENO格式^[21]进行近似计算。以J关于x₁的偏导数

J_{x_{1}} ≜ \partial J / \partial x_{1}

的计算为例进行说明。令J关于x₁的一阶均差定义在网格节点中点处：

D_{d_{1} + 1 / 2}^{1} J ≜ \frac{J_{d_{1} + 1, d_{2}, \dots, d_{n}}^{k} - J_{d_{1}, d_{2}, \dots, d_{n}}^{k}}{Δ x_{1}}

(31)

记：

\{\begin{matrix} {(D^{-} J)}_{d_{1}} ≜ D_{d_{1} - 1 / 2}^{1} J \\ {(D^{+} J)}_{d_{1}} ≜ D_{d_{1} + 1 / 2}^{1} J \end{matrix}

(32)

取模板

\{J_{d_{1} - 3 ， d_{2} ， \dots ， d_{n}}^{k} ， J_{d_{1} - 2 ， d_{2} ， \dots ， d_{n}}^{k} ， J_{d_{1} - 1 ， d_{2} ， \dots ， d_{n}}^{k} ， J_{d_{1} ， d_{2} ， \dots ， d_{n}}^{k} ， J_{d_{1} + 1 ， d_{2} ， \dots ， d_{n}}^{k} ， J_{d_{1} + 2 ， d_{2} ， \dots ， d_{n}}^{k}\}

，定义

\{\begin{matrix} v_{1} ≜ {(D^{-} J)}_{d_{1} - 2} \\ v_{2} ≜ {(D^{-} J)}_{d_{1} - 1} \\ v_{3} ≜ {(D^{-} J)}_{d_{1}} \\ v_{4} ≜ {(D^{-} J)}_{d_{1} + 1} \\ v_{5} ≜ {(D^{-} J)}_{d_{1} + 2} \end{matrix}

(33)

偏导数左近似

J_{x_{1}}^{-}

可以根据下式计算：

J_{x_{1}}^{-} = ω_{1} {(J_{x_{1}}^{-})}_{1} + ω_{2} {(J_{x_{1}}^{-})}_{2} + ω_{3} {(J_{x_{1}}^{-})}_{3}

(34)

式中，

\{\begin{matrix} {(J_{x_{1}}^{-})}_{1} ≜ \frac{1}{3} v_{1} - \frac{7}{6} v_{2} + \frac{11}{6} v_{3} \\ {(J_{x_{1}}^{-})}_{2} ≜ - \frac{1}{6} v_{2} + \frac{5}{6} v_{3} + \frac{1}{3} v_{4} \\ {(J_{x_{1}}^{-})}_{3} ≜ \frac{1}{3} v_{3} + \frac{5}{6} v_{4} - \frac{1}{6} v_{5} \end{matrix}

(35)

非线性权重ω_m（m=1，2，3）为

\{\begin{matrix} ω_{m} = \frac{α_{m}}{α_{1} + α_{2} + α_{3}} \\ α_{m} = \frac{c_{m}}{{(ε + S_{m})}^{2}} \end{matrix}

(36)

其中，ε为灵敏度参数，是一个小量，用于避免分母为零，其取值一定程度上影响收敛精度^[22]，通常取为ε=10^-6；c_m为理想权重，以确保在光滑区域能够获得最优近似，其取值分别为c₁=0.1、c₂=0.6、c₃=0.3；S_m为光滑因子，表达式为

\{\begin{matrix} S_{1} = \frac{13}{12} {(v_{1} - 2 v_{2} + v_{3})}^{2} + \frac{1}{4} {(v_{1} - 4 v_{2} + 3 v_{3})}^{2} \\ S_{2} = \frac{13}{12} {(v_{2} - 2 v_{3} + v_{4})}^{2} + \frac{1}{4} {(v_{2} - v_{4})}^{2} \\ S_{3} = \frac{13}{12} {(v_{3} - 2 v_{4} + v_{5})}^{2} + \frac{1}{4} {(3 v_{3} - 4 v_{4} + v_{5})}^{2} \end{matrix}

(37)

取

\{J_{d_{1} - 2 ， d_{2} ， \dots ， d_{n}}^{k} ， J_{d_{1} - 1 ， d_{2} ， \dots ， d_{n}}^{k} ， J_{d_{1} ， d_{2} ， \dots ， d_{n}}^{k} ， J_{d_{1} + 1 ， d_{2} ， \dots ， d_{n}}^{k} ， J_{d_{1} + 2 ， d_{2} ， \dots ， d_{n}}^{k} ， J_{d_{1} + 3 ， d_{2} ， \dots ， d_{n}}^{k}\}

，定义

\{\begin{matrix} v_{1} ≜ {(D^{+} J)}_{d_{1} - 2} \\ v_{2} ≜ {(D^{+} J)}_{d_{1} - 1} \\ v_{3} ≜ {(D^{+} J)}_{d_{1}} \\ v_{4} ≜ {(D^{+} J)}_{d_{1} + 1} \\ v_{5} ≜ {(D^{+} J)}_{d_{1} + 2} \end{matrix}

(38)

重复上述计算流程便可以获得偏导数右近似

J_{x_{1}}^{+}

。将d₁替换为d₂，d₃，···，d_n可以计算J关于x₂，x₃，···，x_n的偏导数左右近似，组成空间梯度的左右近似：

{(\nabla_{x} J)}^{-} ≜ [J_{x_{1}}^{-}; J_{x_{2}}^{-}; \dots; J_{x_{n}}^{-}]

(39)

{(\nabla_{x} J)}^{+} ≜ [J_{x_{1}}^{+}; J_{x_{2}}^{+}; \dots; J_{x_{n}}^{+}]

(40)

而Hamilton量的近似

\hat{H}

则使用Lax-Friedrichs格式^[23]计算：

\hat{H} = H (x, \frac{{(\nabla_{x} J)}^{+} + {(\nabla_{x} J)}^{-}}{2}) - \sum_{i = 1}^{n} \frac{β_{i}}{2} (J_{x_{i}}^{+} - J_{x_{i}}^{-})

(41)

式中，

β_{i} = \underset{[J_{x_{i}}^{max}, J_{x_{i}}^{max}]}{m a x} |\frac{\partial H}{\partial J_{x_{i}}}|

(42)

表示各维度耗散系数，用于控制数值黏度。

2.2.3 时间离散

时间离散则采用的是三阶TVD RK方法^[24]。首先通过一次Euler步，将方程数值解推进至t^k+Δt时刻：

\frac{J^{k + 1} - J^{k}}{Δ t} + {\hat{H}}^{k} = 0

(43)

然后再经过一次Euler步，将均衡值推进至t^k+2Δt时刻:

\frac{J^{k + 2} - J^{k + 1}}{Δ t} + {\hat{H}}^{k + 1} = 0

(44)

接着基于输入数据和外推结果的平均步骤

J^{k + 1 / 2} = \frac{3}{4} J^{k} + \frac{1}{4} J^{k + 2}

(45)

提供均衡J在

t^{k} + \frac{1}{2} Δ t

时刻的数值。之后再经历一次Euler步的计算，可以将数值解外推至t^k+

\frac{3}{2} Δ t

时刻：

\frac{J^{k + 3 / 2} - J^{k + 1 / 2}}{Δ t} + {\hat{H}}^{k + 1 / 2} = 0

(46)

最终再次通过平均步骤

J^{k + 1} = \frac{1}{3} J^{k} + \frac{2}{3} J^{k + 3 / 2}

(47)

生成均衡在t^k+Δt时刻的三阶精度TVD近似。

2.3 控制与轨迹生成

在根据WENO-TVD求解器完成计算域Λ内均衡演化的计算之后，可以获得各网格节点上的均衡数值

\hat{J} （ t ， x ）

和相应的数值梯度

\nabla_{x} \hat{J} （ t ， x ）

。其中，

\hat{J} （ t ， x ）

被用于展示均衡的受动力学驱动的动态演化过程，分析博弈态势，研判航天器威胁关系；

\nabla_{x} \hat{J} （ t ， x ）

则用于构造三次样条插值函数

\nabla_{x} {\hat{J}}_{1} （ t ， x ）

，与动力学模型式（11）和控制隐式解式（27）一同构成基于RK方法的数值积分器。之后，只需要向积分器输入场景初始态势序列以及博弈时间便可以构造闭环控制函数，外推闭环均衡轨迹。

3 仿真算例

3.1 场景通用参数设置

首先明确去量纲参数r_ref=100 km、v_ref=100 m/s，并随机设置了6组博弈初始态势，用于展示博弈场景结果，初始态势详细信息记录在表1内。

表1轨道追逃博弈初始场景

Tab.1Initial situation of orbital pursuit-evasion game

其次设置相对距离为终端约束，明确追逃问题目标集为：

G_{0} = \{x \in R^{3} ∣ x_{r} ⩽ 0.02\}

(48)

继而确定相应g（x）：

g (x) = x_{r} - 0.02

(49)

然后使用Bolza型性能指标表示支付泛函：

J (u_{E}, u_{P}) = Ψ (x ({\hat{t}}_{f}), {\hat{t}}_{f}) + \int_{{\hat{t}}_{0}}^{{\hat{t}}_{f}} L (u_{E}, u_{P}) d \hat{t}

(50)

其中，终端支付为：

Ψ (x ({\hat{t}}_{f}), {\hat{t}}_{f}) = \{\begin{matrix} 0 x ({\hat{t}}_{f}) \in G_{0} \\ 10 其 他 \end{matrix}

(51)

过程支付为：

L (u_{E}, u_{P}) = \frac{r_{ref}^{2}}{v_{ref}^{4}} [{(u_{P})}^{T} R_{P}^{T} R_{P} u_{P} - {(u_{E})}^{T} R_{E}^{T} R_{E} u_{E}]

(52)

4 结论

本文针对近距离轨道追逃问题，提出了一种综合运用Bellman最优性原理、有限差分法和插值技术的闭环均衡构造、控制函数生成方法。主要研究结果如下：

1）使用视线旋转坐标系下的降维动力学模型，实现追逃系统状态空间降维，降低了问题的复杂度。

2）基于Bellman最优性原理构造HJI PDE，描述均衡解受动力学驱动的演化过程。通过逆向分析，消除博弈场景初值约束，实现同时处理多组博弈场景。

3）利用Cartesian网格离散状态空间，使用有限差分法构造WENO-TVD求解器计算HJI PDE终值问题，为博弈态势分析和共性规律总结提供支撑。

4）建立控制与均衡梯度的关系，通过数值插值计算当前博弈态势下的均衡及其空间梯度，配合动力学模型使用数值方法外推均衡轨迹以及完整的闭环控制函数，实现非线性追逃问题闭环均衡求解。

需要指出的是，当前方法在以下方面存在限制：

1）降维动力学的推导是基于追逃目标集不包含Ω_s进行的，当目标集中包含Ω_s约束时，无法实现降维；

2）该方法使用的是均匀Cartesian网格，目前只适用于规则区域；

3）该方法需要划分网格，当面对高维动力学模型驱动的博弈问题时，维度的增加会导致网格点数呈指数级增加，即出现维数灾难。

未来将就方法局限性开展针对性研究：

1）针对复杂目标集、高维状态空间、强非线性动力学模型以及多局中人博弈场景进行理论研究，建立相应的数学模型与处理流程，总结一般性博弈规律；

2）为计算支付泛函长时间演化过程寻找稳定可靠的无网格方法，尝试与人工智能技术相结合，开发HJI PDE终值问题的智能计算方法；

3）后续进一步细化模型，考虑存在继电器特性的控制量的输出，增补考虑执行机构特性的追逃问题求解。

图1博弈双方相对位置关系

Fig.1Relative position relationship between game players

下载: 全尺寸图片

图2方法流程

Fig.2Method process

下载: 全尺寸图片

图3均衡受动力学驱动的演化过程（推力构型Ⅰ）

Fig.3Equilibrium evolution process driven by dynamics (thrust configuration Ⅰ)

下载: 全尺寸图片

图4场景1博弈过程（推力构型Ⅰ）

Fig.4Gameplay process of scenario 1 (thrust configuration Ⅰ)

下载: 全尺寸图片

图5场景2博弈过程（推力构型Ⅰ）

Fig.5Gameplay process of scenario 2 (thrust configuration Ⅰ)

下载: 全尺寸图片

图6场景3博弈过程（推力构型Ⅰ）

Fig.6Gameplay process of scenario 3 (thrust configuration Ⅰ)

下载: 全尺寸图片

图7场景4博弈过程（推力构型Ⅰ）

Fig.7Gameplay process of scenario 4 (thrust configuration Ⅰ)

下载: 全尺寸图片

图8场景5博弈过程（推力构型Ⅰ）

Fig.8Gameplay process of scenario 5 (thrust configuration Ⅰ)

下载: 全尺寸图片

图9场景6博弈过程（推力构型Ⅰ）

Fig.9Gameplay process of scenario 6 (thrust configuration Ⅰ)

下载: 全尺寸图片

图10均衡受动力学驱动的演化过程（推力构型Ⅱ）

Fig.10Equilibrium evolution process driven by dynamics (thrust configuration Ⅱ)

下载: 全尺寸图片

图11场景1博弈过程（推力构型Ⅱ）

Fig.11Gameplay process of scenario 1 (thrust configuration Ⅱ)

下载: 全尺寸图片

图12场景2博弈过程（推力构型Ⅱ）

Fig.12Gameplay process of scenario 2 (thrust configuration Ⅱ)

下载: 全尺寸图片

图13场景3博弈过程（推力构型Ⅱ）

Fig.13Gameplay process of scenario 3 (thrust configuration Ⅱ)

下载: 全尺寸图片

图14场景4博弈过程（推力构型Ⅱ）

Fig.14Gameplay process of scenario 4 (thrust configuration Ⅱ)

下载: 全尺寸图片

图15场景5博弈过程（推力构型Ⅱ）

Fig.15Gameplay process of scenario 5 (thrust configuration Ⅱ)

下载: 全尺寸图片

图16场景6博弈过程（推力构型Ⅱ）

Fig.16Gameplay process of scenario 6 (thrust configuration Ⅱ)

下载: 全尺寸图片

表1轨道追逃博弈初始场景

Tab.1Initial situation of orbital pursuit-evasion game

下载: 全尺寸图片

表2计算空间域网格信息

Tab.2Computational spatial grid information

下载: 全尺寸图片

图1博弈双方相对位置关系

Fig.1Relative position relationship between game players

图2方法流程

Fig.2Method process

图3均衡受动力学驱动的演化过程（推力构型Ⅰ）

Fig.3Equilibrium evolution process driven by dynamics (thrust configuration Ⅰ)

图4场景1博弈过程（推力构型Ⅰ）

Fig.4Gameplay process of scenario 1 (thrust configuration Ⅰ)

图5场景2博弈过程（推力构型Ⅰ）

Fig.5Gameplay process of scenario 2 (thrust configuration Ⅰ)

图6场景3博弈过程（推力构型Ⅰ）

Fig.6Gameplay process of scenario 3 (thrust configuration Ⅰ)

图7场景4博弈过程（推力构型Ⅰ）

Fig.7Gameplay process of scenario 4 (thrust configuration Ⅰ)

图8场景5博弈过程（推力构型Ⅰ）

Fig.8Gameplay process of scenario 5 (thrust configuration Ⅰ)

图9场景6博弈过程（推力构型Ⅰ）

Fig.9Gameplay process of scenario 6 (thrust configuration Ⅰ)

图10均衡受动力学驱动的演化过程（推力构型Ⅱ）

Fig.10Equilibrium evolution process driven by dynamics (thrust configuration Ⅱ)

图11场景1博弈过程（推力构型Ⅱ）

Fig.11Gameplay process of scenario 1 (thrust configuration Ⅱ)

图12场景2博弈过程（推力构型Ⅱ）

Fig.12Gameplay process of scenario 2 (thrust configuration Ⅱ)

图13场景3博弈过程（推力构型Ⅱ）

Fig.13Gameplay process of scenario 3 (thrust configuration Ⅱ)

图14场景4博弈过程（推力构型Ⅱ）

Fig.14Gameplay process of scenario 4 (thrust configuration Ⅱ)

图15场景5博弈过程（推力构型Ⅱ）

Fig.15Gameplay process of scenario 5 (thrust configuration Ⅱ)

图16场景6博弈过程（推力构型Ⅱ）

Fig.16Gameplay process of scenario 6 (thrust configuration Ⅱ)

表1轨道追逃博弈初始场景

Tab.1Initial situation of orbital pursuit-evasion game

表2计算空间域网格信息

Tab.2Computational spatial grid information

引用提醒

图(16) / 表(2)

引用本文

杨傅云翔, 杨乐平, 柴华. 计算轨道追逃闭环均衡的有限差分方法[J]. 国防科技大学学报, 2026, 48(1): 247-261.

复制

YANG F Y X, YANG L P, CHAI H. A finite difference method for calculating the closed-loop equilibrium of orbital pursuit-evasion game[J]. Journal of National University of Defense Technology, 2026, 48(1): 247-261.

Copy

计量

图1博弈双方相对位置关系

Fig.1Relative position relationship between game players

图2方法流程

Fig.2Method process

图3均衡受动力学驱动的演化过程（推力构型Ⅰ）

Fig.3Equilibrium evolution process driven by dynamics (thrust configuration Ⅰ)

图4场景1博弈过程（推力构型Ⅰ）

Fig.4Gameplay process of scenario 1 (thrust configuration Ⅰ)

图5场景2博弈过程（推力构型Ⅰ）

Fig.5Gameplay process of scenario 2 (thrust configuration Ⅰ)

图6场景3博弈过程（推力构型Ⅰ）

Fig.6Gameplay process of scenario 3 (thrust configuration Ⅰ)

图7场景4博弈过程（推力构型Ⅰ）

Fig.7Gameplay process of scenario 4 (thrust configuration Ⅰ)

图8场景5博弈过程（推力构型Ⅰ）

Fig.8Gameplay process of scenario 5 (thrust configuration Ⅰ)

图9场景6博弈过程（推力构型Ⅰ）

Fig.9Gameplay process of scenario 6 (thrust configuration Ⅰ)

图10均衡受动力学驱动的演化过程（推力构型Ⅱ）

Fig.10Equilibrium evolution process driven by dynamics (thrust configuration Ⅱ)

图11场景1博弈过程（推力构型Ⅱ）

Fig.11Gameplay process of scenario 1 (thrust configuration Ⅱ)

图12场景2博弈过程（推力构型Ⅱ）

Fig.12Gameplay process of scenario 2 (thrust configuration Ⅱ)

图13场景3博弈过程（推力构型Ⅱ）

Fig.13Gameplay process of scenario 3 (thrust configuration Ⅱ)

图14场景4博弈过程（推力构型Ⅱ）

Fig.14Gameplay process of scenario 4 (thrust configuration Ⅱ)

图15场景5博弈过程（推力构型Ⅱ）

Fig.15Gameplay process of scenario 5 (thrust configuration Ⅱ)

图16场景6博弈过程（推力构型Ⅱ）

Fig.16Gameplay process of scenario 6 (thrust configuration Ⅱ)

表1轨道追逃博弈初始场景

Tab.1Initial situation of orbital pursuit-evasion game

表2计算空间域网格信息

Tab.2Computational spatial grid information

罗亚中, 李振瑜, 祝海. 航天器轨道追逃微分对策研究综述[J]. 中国科学(技术科学),2020,50(12):1533-1545.LUO Y Z, LI Z Y, ZHU H. Survey on spacecraft orbital pursuit-evasion differential games[J]. Scientia Sinica(Technologica),2020,50(12):1533-1545.(in Chinese)

朱彦伟, 张乘铭, 杨傅云翔, 等. 航天器轨道追逃动力学与控制问题研究综述[J]. 国防科技大学学报,2024,46(3):1-11.ZHU Y W, ZHANG C M, YANG F Y X,et al. Survey on dynamics and control problem research in spacecraft orbital pursuit-evasion game[J]. Journal of National University of Defense Technology,2024,46(3):1-11.(in Chinese)

WISZNIEWSKA-MATYSZKIEL A. Open and closed loop Nash equilibria in games with a continuum of players[J]. Journal of Optimization Theory and Applications,2014,160(1):280-301.

EHTAMO H, RAIVIO T. On applied nonlinear and bilevel programming or pursuit-evasion games[J]. Journal of Optimization Theory and Applications,2001,108(1):65-96.

LUKE S. Game theory applications in astrodynamics and space domain awareness[D]. Tuscaloosa: University of Alabama,2021.

ZHANG C M, ZHU Y W, YANG L P,et al. Numerical solution for elliptical orbit pursuit-evasion game via deep neural networks and pseudospectral method[J]. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering,2023,237(4):796-808.

祝海. 基于微分对策的航天器轨道追逃最优控制策略[D]. 长沙: 国防科学技术大学,2017.ZHU H. Optimal control of spacecraft orbital pursuit-evasion based on differential game[D]. Changsha: National University of Defense Technology,2017.(in Chinese)

WOODBURY T D, HURTADO J E. Adaptive play via estimation in uncertain nonzero-sum orbital pursuit evasion games[C]//Proceedings of AIAA Space and Astronautics Forum and Exposition,2017:5247.

YE D, SHI M M, SUN Z W. Satellite proximate interception vector guidance based on differential games[J]. Chinese Journal of Aeronautics,2018,31(6):1352-1361.

YE D, SHI M M, SUN Z W. Satellite proximate pursuit-evasion game with different thrust configurations[J]. Aerospace Science and Technology,2020,99:105715.

PRINCE E R, HESS J A, COBB R G,et al. Elliptical orbit proximity operations differential games[J]. Journal of Guidance Control and Dynamics,2019,42(7):1458-1472.

HAFER W T. Sensitivity methods applied to orbital pursuit-evasion[D]. College Station: Texas A&M University,2014.

LI Z Y, ZHU H, YANG Z,et al. A dimension-reduction solution of free-time differential games for spacecraft pursuit-evasion[J]. Acta Astronautica,2019,163:201-210.

JAGAT A, SINCLAIR A J. Optimization of spacecraft pursuit-evasion game trajectories in the Euler-Hill reference frame[C]//Proceedings of AIAA/AAS Astrodynamics Specialist Conference. AIAA,2014:4131.

JAGAT A, SINCLAIR A J. Nonlinear control for spacecraft pursuit-evasion game using the state-dependent riccati equation method[J]. IEEE Transactions on Aerospace and Electronic Systems,2017,53(6):3032-3042.

STUPIK J M. Optimal pursuit/evasion spacecraft trajectories in the Hill reference frame[D]. Champaign: University of Illinos at Urbana-Champaign,2013.

YANG F Y X, YANG L P, ZHU Y W,et al. A DNN based trajectory optimization method for intercepting non-cooperative maneuvering spacecraft[J]. Journal of Systems Engineering and Electronics,2022,33(2):438-446.

张乘铭, 朱彦伟, 杨乐平, 等. 航天器交会型轨道追逃策略的滚动时域优化[J]. 国防科技大学学报,2024,46(3):21-29.ZHANG C M, ZHU Y W, YANG L P,et al. Receding horizon optimization for spacecraft pursuit-evasion strategy in rendezvous[J]. Journal of National University of Defense Technology,2024,46(3):21-29.(in Chinese)

BECKERS D, ELDREDGE J D. Planar potential flow on Cartesian grids[J]. Journal Fluid Mechanics,2022,941: A19.

STRIKWERDA J C. Finite difference schemes and partial differential equations[M]. Philadelphia: Society for Industrial and Applied Mathematics,2004.

AHMAT M, QIU J X. Direct WENO scheme for dispersion-type equations[J]. Mathematics and Computers in Simulation,2023,204:216-229.

JIANG G S, SHU C W. Efficient implementation of weighted ENO schemes[J]. Journal of Computational Physics,1996,126(1):202-228.

JIANG Y Q, ZHOU S G, ZHANG X,et al. High-order weighted compact nonlinear scheme for one-and two-dimensional Hamilton-Jacobi equations[J]. Applied Numerical Mathematics,2022,171:353-368.

SHU C W, OSHER S. Efficient implementation of essentially non-oscillatory shock-capturing schemes[J]. Journal of Computational Physics,1988,77(2):439-471.

Home

About Journal

Guide for Authors

Editorial Board

Publication Statement

Open Access Statement

Contact

Journal Subscription

Rss

AI assistant

Chinese

1 问题建模

2 闭环均衡求解方法

3 仿真算例

4 结论