引用本文: | 张锦,张建忠,汪飞,等.海量公交数据的人群画像算法.[J].国防科技大学学报,2023,45(2):55-64.[点击复制] |
ZHANG Jin,ZHANG Jianzhong,WANG Fei,et al.Crowd profiling algorithm mass transit data[J].Journal of National University of Defense Technology,2023,45(2):55-64[点击复制] |
|
|
|
本文已被:浏览 4538次 下载 3091次 |
海量公交数据的人群画像算法 |
张锦1,2,张建忠1,汪飞3,郭芊1 |
(1. 湖南师范大学 信息科学与工程学院, 湖南 长沙 410006;2. 长沙理工大学 计算机与通信工程学院, 湖南 长沙 410114;3. 湖南师范大学 数学与统计学院, 湖南 长沙 410006)
|
摘要: |
面向海量公交数据的人群画像对分析城市群体出行特点、交通态势等极具价值,但对数据的处理存在耗时高、质量低、解释难等问题。提出一种海量公交数据人群画像的系统化解决策略,基于PageRank算法筛选出经过重要站点的人群轨迹,极大减少目标人群的轨迹数据;提出轨迹文本化分析方法来提高人群画像的可解释性;分析确定基于余弦距离的K-means算法作为人群画像分类的聚类算法。该算法在3 000万乘客公交出行数据上的实验表明:提出的解决策略能够较为系统性地解决海量公交数据的人群画像问题,同时基于余弦距离的K-means算法的聚类效果最好且准确率约达80%。将人群画像及其轨迹使用Flow Map进行可视化展示,结果符合真实世界的人群行为特征。 |
关键词: 人群画像 PageRank算法 轨迹文本化 文本聚类 |
DOI:10.11887/j.cn.202302006 |
投稿日期:2021-02-26 |
基金项目:国家部委基金资助项目(31511010105);湖南省自然科学基金资助项目(2021JJ30456) |
|
Crowd profiling algorithm mass transit data |
ZHANG Jin1,2, ZHANG Jianzhong1, WANG Fei3, GUO Qian1 |
(1. College of Information Science and Engineering, Hunan Normal University, Changsha 410006, China;2. School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, China;3. School of Mathematics and Statistics, Hunan Normal University, Changsha 410006, China )
|
Abstract: |
Crowd profiling of massive transit data is valuable for analyzing the travel characteristics and traffic trends of urban groups, but the processing of the data is time-consuming, low-quality and difficult to interpret. A systematic solution for crowd profiling of massive public transport data was proposed. Based on the PageRank algorithm, the trajectories of people passing through important stations were filtered out, which greatly reduced the trajectory data of the target population. A textual analysis method for trajectories was proposed to improve the interpretability of crowd profiling. And the K-means algorithm based on cosine distance as the clustering algorithm for crowd profiling was analysed and determined. The experiments on 30 million passengers′ transit data show that the proposed algorithm can solve the problem of crowd profiling in massive transit data in a more systematic way, while the K-means algorithm based on cosine distance has the best clustering effect and the accuracy rate is about 80%. The crowd profiling and its trajectory were visually displayed by using Flow Map, and the results are consistent with real-world crowd behavioural characteristics. |
Keywords: crowd portraits PageRank algorithm trajectory textualization text clustering |
|
|
|
|
|