Abstract:For the problem of workflow job scheduling, the critical path method was proposed to predict the execution time of the workflow and allocate resources. The parallel application directed acyclic graph was used to describe the relationships among the sub-jobs of a workflow in the workflow execution time prediction algorithm. Based on this order, the system resources were logically allocated to the sub-jobs. According to the characteristics and resource allocation information of sub-jobs, the gradient boosting decision tree-based algorithm was used to predict the execution time of sub-jobs, and the critical path of workflow was calculated. The sum of the completion time of all sub-jobs on the critical path is the execution time of the workflow. If the predicted workflow execution time satisfies the user′s requirements, job scheduling was executed according to the sub-job execution sequence and resource allocation scheme, and the workflow was executed. Comparative experiments show that the prediction errors of the execution time of two workflows are 5.72% and 1.57%, respectively. Compared with the default scheduling algorithm of Spark, the workflow scheduling algorithm reduces the completion time of the two workflows by 15.71% and 15.44%, respectively.