执行时间预测驱动的工作流作业调度
2024,46(5):228-238
胡亚红
浙江工业大学 计算机科学与技术学院, 浙江 杭州 310023,huyahong@zjut.edu.cn
邱圆圆
浙江工业大学 计算机科学与技术学院, 浙江 杭州 310023
毛家发
浙江工业大学 计算机科学与技术学院, 浙江 杭州 310023
浙江工业大学 计算机科学与技术学院, 浙江 杭州 310023,huyahong@zjut.edu.cn
邱圆圆
浙江工业大学 计算机科学与技术学院, 浙江 杭州 310023
毛家发
浙江工业大学 计算机科学与技术学院, 浙江 杭州 310023
摘要:
针对工作流作业调度问题,提出使用关键路径法进行工作流的执行时间预测和资源分配。工作流执行时间预测算法使用并行应用有向无环图描述工作流中子作业的执行顺序。基于此顺序,为子作业进行系统资源的逻辑分配。根据子作业的特征和资源分配信息,使用梯度提升决策树进行子作业执行时间预测,并计算工作流的关键路径。关键路径上所有子作业的完成时间之和即为工作流的执行时间。若预测的工作流执行时间满足用户要求,则根据子作业执行顺序和资源分配方案进行作业调度,执行工作流。对比实验表明,两个工作流的执行时间预测误差分别为5.72%和1.57%。与Spark默认调度算法相比,工作流调度算法将两个工作流的完成时间分别缩短了15.71%和15.44%。
针对工作流作业调度问题,提出使用关键路径法进行工作流的执行时间预测和资源分配。工作流执行时间预测算法使用并行应用有向无环图描述工作流中子作业的执行顺序。基于此顺序,为子作业进行系统资源的逻辑分配。根据子作业的特征和资源分配信息,使用梯度提升决策树进行子作业执行时间预测,并计算工作流的关键路径。关键路径上所有子作业的完成时间之和即为工作流的执行时间。若预测的工作流执行时间满足用户要求,则根据子作业执行顺序和资源分配方案进行作业调度,执行工作流。对比实验表明,两个工作流的执行时间预测误差分别为5.72%和1.57%。与Spark默认调度算法相比,工作流调度算法将两个工作流的完成时间分别缩短了15.71%和15.44%。
基金项目:
国家重点研发计划资助项目(2018YFB0204003)
国家重点研发计划资助项目(2018YFB0204003)
Workflows scheduling powered by execution time prediction model
HU Yahong
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China,huyahong@zjut.edu.cn
QIU Yuanyuan
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
MAO Jiafa
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China,huyahong@zjut.edu.cn
QIU Yuanyuan
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
MAO Jiafa
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
Abstract:
For the problem of workflow job scheduling, the critical path method was proposed to predict the execution time of the workflow and allocate resources. The parallel application directed acyclic graph was used to describe the relationships among the sub-jobs of a workflow in the workflow execution time prediction algorithm. Based on this order, the system resources were logically allocated to the sub-jobs. According to the characteristics and resource allocation information of sub-jobs, the gradient boosting decision tree-based algorithm was used to predict the execution time of sub-jobs, and the critical path of workflow was calculated. The sum of the completion time of all sub-jobs on the critical path is the execution time of the workflow. If the predicted workflow execution time satisfies the user′s requirements, job scheduling was executed according to the sub-job execution sequence and resource allocation scheme, and the workflow was executed. Comparative experiments show that the prediction errors of the execution time of two workflows are 5.72% and 1.57%, respectively. Compared with the default scheduling algorithm of Spark, the workflow scheduling algorithm reduces the completion time of the two workflows by 15.71% and 15.44%, respectively.
For the problem of workflow job scheduling, the critical path method was proposed to predict the execution time of the workflow and allocate resources. The parallel application directed acyclic graph was used to describe the relationships among the sub-jobs of a workflow in the workflow execution time prediction algorithm. Based on this order, the system resources were logically allocated to the sub-jobs. According to the characteristics and resource allocation information of sub-jobs, the gradient boosting decision tree-based algorithm was used to predict the execution time of sub-jobs, and the critical path of workflow was calculated. The sum of the completion time of all sub-jobs on the critical path is the execution time of the workflow. If the predicted workflow execution time satisfies the user′s requirements, job scheduling was executed according to the sub-job execution sequence and resource allocation scheme, and the workflow was executed. Comparative experiments show that the prediction errors of the execution time of two workflows are 5.72% and 1.57%, respectively. Compared with the default scheduling algorithm of Spark, the workflow scheduling algorithm reduces the completion time of the two workflows by 15.71% and 15.44%, respectively.
收稿日期:
2022-05-21
2022-05-21