海量公交数据的人群画像算法
作者:
作者单位:

(1. 湖南师范大学 信息科学与工程学院, 湖南 长沙 410006;2. 长沙理工大学 计算机与通信工程学院, 湖南 长沙 410114;3. 湖南师范大学 数学与统计学院, 湖南 长沙 410006)

作者简介:

张锦(1979—),男,河南信阳人,教授,博士,博士生导师,E-mail:jinzhang@hunnu.edu.cn; 汪飞(通信作者),男,安徽枞阳人,讲师,博士,E-mail:wangfei@hunnu.edu.cn

通讯作者:

中图分类号:

TP3-05

基金项目:

国家部委基金资助项目(31511010105);湖南省自然科学基金资助项目(2021JJ30456)


Crowd profiling algorithm mass transit data
Author:
Affiliation:

(1. College of Information Science and Engineering, Hunan Normal University, Changsha 410006, China;2. School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, China;3. School of Mathematics and Statistics, Hunan Normal University, Changsha 410006, China )

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    面向海量公交数据的人群画像对分析城市群体出行特点、交通态势等极具价值,但对数据的处理存在耗时高、质量低、解释难等问题。提出一种海量公交数据人群画像的系统化解决策略,基于PageRank算法筛选出经过重要站点的人群轨迹,极大减少目标人群的轨迹数据;提出轨迹文本化分析方法来提高人群画像的可解释性;分析确定基于余弦距离的K-means算法作为人群画像分类的聚类算法。该算法在3 000万乘客公交出行数据上的实验表明:提出的解决策略能够较为系统性地解决海量公交数据的人群画像问题,同时基于余弦距离的K-means算法的聚类效果最好且准确率约达80%。将人群画像及其轨迹使用Flow Map进行可视化展示,结果符合真实世界的人群行为特征。

    Abstract:

    Crowd profiling of massive transit data is valuable for analyzing the travel characteristics and traffic trends of urban groups, but the processing of the data is time-consuming, low-quality and difficult to interpret. A systematic solution for crowd profiling of massive public transport data was proposed. Based on the PageRank algorithm, the trajectories of people passing through important stations were filtered out, which greatly reduced the trajectory data of the target population. A textual analysis method for trajectories was proposed to improve the interpretability of crowd profiling. And the K-means algorithm based on cosine distance as the clustering algorithm for crowd profiling was analysed and determined. The experiments on 30 million passengers′ transit data show that the proposed algorithm can solve the problem of crowd profiling in massive transit data in a more systematic way, while the K-means algorithm based on cosine distance has the best clustering effect and the accuracy rate is about 80%. The crowd profiling and its trajectory were visually displayed by using Flow Map, and the results are consistent with real-world crowd behavioural characteristics.

    参考文献
    相似文献
    引证文献
引用本文

张锦,张建忠,汪飞,等.海量公交数据的人群画像算法[J].国防科技大学学报,2023,45(2):55-64.
ZHANG Jin, ZHANG Jianzhong, WANG Fei, et al. Crowd profiling algorithm mass transit data[J]. Journal of National University of Defense Technology,2023,45(2):55-64.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-02-26
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-04-03
  • 出版日期: 2023-04-28
文章二维码