国防科技大学学报

引用本文:	黄海清,张平,张曦文.用户偏好提取MDP建模研究.[J].国防科技大学学报,2006,28(6):81-85.[点击复制]
	HUANG HaiQing,ZHANG Ping,ZHANG Xiwen.Modeling of User Preference Based on MDP[J].Journal of National University of Defense Technology,2006,28(6):81-85[点击复制]

【打印本页】【在线阅读全文】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

←前一篇|后一篇→

过刊浏览高级检索

本文已被：浏览 5791次下载 6359次

用户偏好提取MDP建模研究

黄海清¹, 张平¹, 张曦文²

(1.北京邮电大学电信工程学院，北京 100876;2.航天部第二研究院中心军代室，北京 100854)

摘要:

将马尔可夫判决过程和智能强化学习算法相结合，给出了异构无线网络环境下用户业务偏好评估模型的技术框架。为动态环境下用户需求的感知、量化和适配特征的研究提供了基本的数学描述，对解决用户体验的评价问题和业务与业务环境的适配问题提供了新的研究思路。仿真结果表明所构建的MDP模型能够在多状态条件下学习用户偏好，根据用户需求智能选择业务。

关键词: 效用理论用户偏好马尔可夫判决过程强化学习

DOI：

投稿日期：2006-06-25

基金项目:国家863高技术资助项目(2003AA12331004)

Modeling of User Preference Based on MDP

HUANG HaiQing¹, ZHANG Ping¹, ZHANG Xiwen²

(1.School of Telecommunication Engineering, Beijing Univ. of Posts and Telecommunications,Beijing 100876, China;2.The 2th Institute of China Aerospace Science & Industry, Beijing 100854, China)

Abstract:

A technical architecture for user preference model is presented, and the nature of the problem represented within a Markov Decision Process(MDP) combined with adaptive reinforcement learning algorithm is displayed. We provided a possible candidate solution for user modeling dynamically to satisfy the user's expected preference based on minimal or missing information. It is also a exploration for the evaluation of the user experience when selecting service providers. Simulations of the user models show that the MDP model is effective for learning the user preference with multi-state profiles.

Keywords: utility theory user preference Markov decision process reinforcement learning