引用本文: | 黄海清,张平,张曦文.用户偏好提取MDP建模研究.[J].国防科技大学学报,2006,28(6):81-85.[点击复制] |
HUANG HaiQing,ZHANG Ping,ZHANG Xiwen.Modeling of User Preference Based on MDP[J].Journal of National University of Defense Technology,2006,28(6):81-85[点击复制] |
|
|
|
本文已被:浏览 5829次 下载 6392次 |
用户偏好提取MDP建模研究 |
黄海清1, 张平1, 张曦文2 |
(1.北京邮电大学 电信工程学院,北京 100876;2.航天部第二研究院 中心军代室,北京 100854)
|
摘要: |
将马尔可夫判决过程和智能强化学习算法相结合,给出了异构无线网络环境下用户业务偏好评估模型的技术框架。为动态环境下用户需求的感知、量化和适配特征的研究提供了基本的数学描述,对解决用户体验的评价问题和业务与业务环境的适配问题提供了新的研究思路。仿真结果表明所构建的MDP模型能够在多状态条件下学习用户偏好,根据用户需求智能选择业务。 |
关键词: 效用理论 用户偏好 马尔可夫判决过程 强化学习 |
DOI: |
投稿日期:2006-06-25 |
基金项目:国家863高技术资助项目(2003AA12331004) |
|
Modeling of User Preference Based on MDP |
HUANG HaiQing1, ZHANG Ping1, ZHANG Xiwen2 |
(1.School of Telecommunication Engineering, Beijing Univ. of Posts and Telecommunications,Beijing 100876, China;2.The 2th Institute of China Aerospace Science & Industry, Beijing 100854, China)
|
Abstract: |
A technical architecture for user preference model is presented, and the nature of the problem represented within a Markov Decision Process(MDP) combined with adaptive reinforcement learning algorithm is displayed. We provided a possible candidate solution for user modeling dynamically to satisfy the user's expected preference based on minimal or missing information. It is also a exploration for the evaluation of the user experience when selecting service providers. Simulations of the user models show that the MDP model is effective for learning the user preference with multi-state profiles. |
Keywords: utility theory user preference Markov decision process reinforcement learning |
|
|