引用本文: | 戴瑾,王天宇,王少尉.基于深度森林的网络流量分类方法.[J].国防科技大学学报,2020,42(4):30-34.[点击复制] |
DAI Jin,WANG Tianyu,WANG Shaowei.Network traffic classification method based on deep forest[J].Journal of National University of Defense Technology,2020,42(4):30-34[点击复制] |
|
|
|
本文已被:浏览 7392次 下载 6725次 |
基于深度森林的网络流量分类方法 |
戴瑾1,2,王天宇2,3,王少尉2 |
(1. 南京大学金陵学院 信息科学与工程学院, 江苏 南京 210089;2. 南京大学 电子科学与工程学院, 江苏 南京 210023;3. 东南大学 国家移动通信研究实验室, 江苏 南京 210096)
|
摘要: |
随着网络应用的迅猛发展,流量分类在网络资源分配、流量调度和网络安全等诸多研究领域受到广泛关注。现有的机器学习流量分类方法对流量数据特征的选取和分布要求苛刻,导致在实际应用中的复杂流量场景下分类精确度和稳定度难以提高。为了解决样本特征属性的复杂性给分类性能带来的不利影响,引入了基于深度森林的流量分类方法。该算法通过级联森林和多粒度扫描机制,能够在样本数量规模和特征属性选取规模有限的情况下,有效地提高流量整体分类性能。通过网络流量公开数据集Moore对支持向量机、随机森林和深度森林机器学习算法进行训练和测试,结果表明基于深度森林的网络流量分类器的分类准确率能够达到96.36%,性能优于其他机器学习模型。 |
关键词: 特征选取 多粒度级联森林 机器学习 网络流量分类 |
DOI:10.11887/j.cn.202004006 |
投稿日期:2019-12-25 |
基金项目:国家自然科学基金资助项目(61801208,61671233, 61931023, U1936202) |
|
Network traffic classification method based on deep forest |
DAI Jin1,2, WANG Tianyu2,3, WANG Shaowei2 |
(1. School of Information Science and Engineering, Jinling College, Nanjing University, Nanjing 210089, China;2. School of Electronic Science and Engineering, Nanjing University, Nanjing 210023, China;3. National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China)
|
Abstract: |
With the rapid development of network applications, the Internet traffic classification has a profound impact on the research fields of network resource allocation, traffic scheduling and network security. The traditional flow analysis method based on machine learning has strict requirements for the feature selection and distribution of network flows, which makes it difficult to accurately and stably classify the complex and changeable flow data in practical application. In order to solve the adverse impact of the complexity of sample features on the traffic classification, a new classification method based on deep forest, which utilizes the cascade forest of decision trees and the multi-grained scanning mechanisms aiming to improve classification performance in the case of limited scale of samples and features, was proposed. The machine learning algorithms including support vector machine, random forest and deep forest were trained and tested by using Moore, which is a flow data set in public domain. The experiment results show that the classification accuracy using deep forest model reaches 96.36%, which outperforms the other machine learning models. |
Keywords: feature selection multi-grained cascade forest machine learning network traffic classification |
|
|
|
|
|